GPT-5.6 Sol, Terra & Luna: Full Review, Benchmarks & Pricing
On June 26, 2026, OpenAI released GPT-5.6 as a three-model family named after solar-system bodies: Sol (Sun), Terra (Earth), and Luna (Moon). Flagship Sol tops TerminalBench 2.1 at 91.9% in Ultra multi-agent mode and hits 96.7% on cybersecurity CTF tasks. Access is constrained: roughly 20 government-cleared partners can preview via API and Codex while general ChatGPT users wait. This guide consolidates official announcements, System Card data, and reported benchmarks—pricing, Ultra architecture, Cerebras acceleration, policy friction, Mythos 5 comparisons, safety controls, and a five-step developer playbook—so you can choose a model before July general availability.
1. Three pain points GPT-5.6 creates for model selection
GPT-5.6 is not a minor point release. It simultaneously shifts coding agents, cybersecurity research economics, and API cost curves—while June's "super release month" left all three frontier labs partially blocked. Teams most often stumble on three issues:
- Treating limited preview as general availability. Only about 20 approved partners can call Sol, Terra, or Luna through API and Codex today. ChatGPT consumers still wait. Canceling GPT-5.5 production routes or pre-paying annual tiers before broad rollout risks a gap if July timelines slip.
- Underestimating Ultra mode token bills. Sol's 91.9% TerminalBench score depends on Ultra multi-agent parallelism—multiple sub-agents working in parallel inflate output tokens fast. Without per-mode budget caps, a single complex agent run can exhaust monthly API quotas overnight.
- Using a sleeping laptop as the Codex/Cursor evaluation host. Release windows demand continuous multi-step regressions, SFTP-synced logs, and artifact snapshots. A MacBook that suspends on lid-close produces "occasional green, mostly timeout" results unrelated to whether Sol is actually stronger than Mythos 5.
2. Quick reference: Sol, Terra, Luna pricing and positioning
| Model | Tier | Input (per 1M tokens) | Output (per 1M tokens) | Highlight |
|---|---|---|---|---|
| GPT-5.6 Sol | Flagship | $5 | $30 | TerminalBench 2.1 world #1 at 91.9% (Ultra) |
| GPT-5.6 Terra | Balanced workhorse | $2.50 | $15 | Near GPT-5.5 quality at 50% lower cost |
| GPT-5.6 Luna | Lightweight / fast | $1 | $6 | High-frequency tasks; up to 80% cheaper than Sol |
Current status: Government review limits preview to roughly 20 trusted partners; broad access expected within weeks (July 2026). Reported context window: approximately 1.5M tokens, pending full System Card confirmation (up from GPT-5.5's 1M).
3. Release context: solar naming and government review
OpenAI's June 26 launch introduces its first solar-system naming scheme: Sol for flagship, Terra for balanced enterprise workloads, and Luna for lightweight automation. The rollout arrived under unusual friction.
A June 2, 2026 executive order gave the US government up to 30 days to review frontier models before public release—the first time Washington required a limited debut rather than immediate broad access. After OSTP and ONCD coordination, OpenAI agreed to preview GPT-5.6 with approximately 20 cleared partners. CEO Sam Altman cooperated while stating publicly that government pre-approval should not become permanent industry practice.
GPT-5.6 is also the first OpenAI product line where all three tiers—including entry-level Luna—trigger OpenAI's High cybersecurity capability rating.
4. Model deep dive: Max and Ultra modes
GPT-5.6 Sol — flagship
Sol targets the hardest workloads: advanced coding, long-horizon cybersecurity research, and multi-step agentic pipelines that require tool use, iteration, and coordination.
Two new reasoning modes:
- Max mode: Allocates additional inference time for accuracy-critical tasks where latency is secondary.
- Ultra mode: A multi-agent architecture—Sol decomposes complex work, dispatches parallel sub-agents, and merges results. This design drives the TerminalBench leap from 88.8% (standard) to 91.9% (Ultra).
Pricing matches GPT-5.5: $5 / $30 per million input/output tokens.
GPT-5.6 Terra — balanced
Terra is the default enterprise tier for customer support, internal tools, and document analysis at scale. Performance tracks GPT-5.5 while cutting cost 50%—the best price-performance ratio for high-volume API traffic. Pricing: $2.50 / $15 per million tokens.
GPT-5.6 Luna — lightweight
Luna optimizes for summarization, drafting, and routine automation with low latency. Notably, Luna is OpenAI's first non-flagship model rated High in both cybersecurity and biology capability assessments. Pricing: $1 / $6 per million tokens.
5. Benchmarks: TerminalBench, CTF, life sciences
TerminalBench 2.1 — coding agents
TerminalBench 2.1 spans 89 complex command-line planning tasks, measuring multi-step tool invocation, iterative repair, and task coordination under realistic agent constraints.
| Model | Score | Mode |
|---|---|---|
| GPT-5.6 Sol | 91.9% | Ultra (multi-agent) |
| GPT-5.6 Sol | 88.8% | Standard |
| Claude Mythos 5 | 88.0% | Standard |
| GPT-5.5 | 83.4% | Standard |
| Gemini 3.1 Pro Preview | 70.7% | Standard |
Sol displaced Mythos 5 from the top spot in just 17 days—Mythos 5 had claimed #1 on June 9.
Agent's Last Exam — long-horizon agents
| Model | Task completion (code mode) |
|---|---|
| GPT-5.6 Sol | 50.9% (first model above 50%) |
| GPT-5.6 Luna | Slightly above GPT-5.5 |
Cybersecurity: CTF and ExploitBench
| Model | CTF hit rate |
|---|---|
| Sol | 96.7% |
| Terra | 91.84% |
| Luna | 85.19% |
ExploitBench: Sol matches Anthropic's Mythos Preview while consuming roughly one-third the output tokens, materially lowering enterprise security-research spend.
Safety boundary: OpenAI testing on Chromium and Firefox codebases shows Sol can identify vulnerabilities and exploit primitives but cannot autonomously construct complete, weaponized exploit chains—keeping it below the "Cyber Critical" threshold in OpenAI's framework.
Life sciences: GeneBench v1 and HealthBench
- GeneBench v1 (genomics and quantitative biology): Sol matches or exceeds GPT-5.5 with fewer tokens.
- HealthBench Professional: Sol scores 60.5, a +8.7 point gain over GPT-5.5.
6. Cerebras 750 token/s acceleration (July 2026)
Starting July 2026, GPT-5.6 Sol on Cerebras hardware acceleration will reach up to 750 tokens per second for select enterprise deployments.
Context: most flagship models today output between 50 and 150 token/s. At 750 token/s, time-to-first-complete-response can shrink to one-fifth or one-fifteenth of current latencies—meaningful for streaming copilots and real-time agent loops. Initial access remains limited to vetted enterprise customers.
7. Policy friction: the Big Three blocked in June
The June 2 executive order is non-mandatory on paper but created practical constraints: frontier labs faced up to 30 days of federal review before broad release. June was supposed to be AI's "super release month"; instead, all three leading labs hit delays.
| Company | Model | June 2026 status |
|---|---|---|
| OpenAI | GPT-5.6 Sol / Terra / Luna | Limited preview for ~20 approved partners |
| Anthropic | Claude Fable 5 / Mythos 5 | Forced offline June 12 under export-control order |
| Gemini 3.5 Pro | Delayed to July; originally slated for June |
OpenAI's countermeasures include real-time abuse classifiers, account-level review, 700,000 A100-equivalent GPU hours of automated red teaming, universal jailbreak testing, and a dedicated high-reasoning model as a final filter layer before deployment.
8. GPT-5.6 Sol vs Claude Mythos 5
| Dimension | GPT-5.6 Sol | Claude Mythos 5 |
|---|---|---|
| TerminalBench 2.1 | 91.9% (Ultra) / 88.8% standard | 88.0% |
| ExploitBench | Parity with Mythos Preview at ~1/3 tokens | Data not publicly released |
| Input price | $5 / M tokens | Formerly $10 / M (currently offline) |
| Availability | Limited preview; broad access expected July | Offline due to export controls |
| Context window | ~1.5M tokens | 200K tokens |
Bottom line: Sol leads on TerminalBench and cost-efficient security research, at half Mythos 5's former input price. Fable 5 still holds edges on some benchmarks such as SWE-bench Pro, but remains unavailable. Full System Card comparisons will sharpen once OpenAI publishes complete public data.
9. Access timeline and Polymarket odds
Current phase (late June 2026):
- Approximately 20 government-cleared trusted partners access Sol, Terra, and Luna via API and Codex
- General ChatGPT users cannot select GPT-5.6 yet
Expected July 2026:
- ChatGPT rollout (Plus and Pro tiers first)
- Public API availability
- Cerebras-accelerated Sol for enterprise (up to 750 token/s)
Prediction markets: Polymarket prices roughly 87% probability that GPT-5.6 reaches general availability by July 31, 2026. Treat this as sentiment, not a service-level agreement.
10. Use-case recommendation matrix
| Your workload | Recommended model |
|---|---|
| Complex code generation, debugging, multi-step agents | Sol (Ultra mode) |
| Enterprise document analysis, support bots, bulk API calls | Terra |
| Summarization, drafting, routine automation | Luna |
| GPT-5.5-class quality on a tighter budget | Terra (same tier, 50% lower cost) |
| Latency-critical streaming apps (post-July) | Sol on Cerebras |
11. Safety measures and capability guardrails
All three GPT-5.6 tiers carry High cybersecurity ratings—the first time Luna shares that classification with a flagship. OpenAI's deployment stack for this release includes:
- Real-time abuse classifiers monitoring API and product traffic
- Account-level review for high-risk usage patterns
- 700,000 A100-equivalent GPU hours of automated red-team evaluation
- Universal jailbreak and prompt-injection test suites
- A dedicated high-reasoning filter model as the final safety layer
Capability testing confirms Sol can surface vulnerability patterns in browser engine codebases but stops short of autonomously assembling full exploit chains—a deliberate guardrail that keeps the model below OpenAI's most severe cyber-risk tier while still enabling defensive security research.
12. Five-step developer checklist
Complete this baseline before GPT-5.6 general availability so release week is a controlled migration, not a fire drill:
- Lock production model routing. Keep GPT-5.5 or Claude Opus 4.8 as default. Issue sandbox API keys for Sol, Terra, and Luna with monthly caps; set a separate alert for Ultra multi-agent spend.
- Subscribe to official channels. Track OpenAI's blog, Platform docs, and Deployment Safety System Card. Do not re-route production based on Polymarket odds alone.
- Build an isolated evaluation sandbox. Run Codex CLI or a multi-model gateway on a dedicated branch with per-mode token metering for Ultra parallelism.
- Prepare internal benchmark suites. Three to five cases each for coding agents, CTF-style scans, and long-context RAG. Sync results via SFTP or rsync into versioned artifact directories for regression diffs.
- Deploy a 24/7 remote Mac node. Host Cursor, Codex, and benchmark scripts on always-on Apple Silicon so lid-close on a laptop does not break release-week continuous testing.
13. Frequently asked questions
Is GPT-5.6 in ChatGPT today?
Not for general users. Roughly 20 cleared partners have API and Codex access; ChatGPT rollout is expected within weeks, likely July.
What is Sol's Ultra mode?
Ultra deploys parallel sub-agents that divide complex tasks and merge outputs—key to the 91.9% TerminalBench score, with significantly higher token consumption than standard mode.
Is GPT-5.6 better than Claude Fable 5 for coding?
Sol leads Mythos 5 on TerminalBench (91.9% vs 88%). Fable 5 retains SWE-bench Pro advantages but is offline. Sol input pricing is half Fable 5's former rate.
Are all three models safe to deploy?
All three rate High for cybersecurity capability, but OpenAI confirms they cannot autonomously build complete weaponized exploit chains. Classifiers and red-team testing are live in preview.
How fast is the July Cerebras build?
Up to 750 token/s—about 5 to 15 times faster than typical 50–150 token/s flagship output—initially for select enterprise customers.
14. Summary: capability gains meet an always-on Mac bottleneck
GPT-5.6 advances on three axes at once: capability (Sol Ultra dethroned Mythos 5 on TerminalBench in 17 days), efficiency (ExploitBench parity at one-third the tokens), and speed (July Cerebras at 750 token/s). The June government review also set a precedent—frontier models may face mandatory preview windows—that could reshape how every lab ships frontier weights.
Reading benchmark tables does not automatically stabilize your Codex or Cursor pipeline on day one of general availability. Ultra multi-agent evaluations, SFTP-synced logs, and overnight regression suites need always-on, low-latency, native macOS tooling. Intermittent laptops or undersized cloud VMs show "occasional pass, mostly timeout" during preview and Cerebras gray windows—regardless of whether Sol truly scores 91.9%.
If you are preparing GPT-5.6 gray testing, the practical next step is landing Cursor, Codex CLI, and evaluation artifacts on a persistent Apple Silicon node with SFTP/rsync rollback. SFTPMAC remote Mac rental targets AI agent and Codex benchmarking: native Xcode and Metal parity, 24/7 launchd supervision, low-latency API callbacks, and operational baselines aligned with our GPT-5.5 and Claude migration guides—a better fit than a home Mac pulling double duty as both daily driver and release-week evaluation host.