Июнь 2026: скидки на AI-модели и IDE — hardcore-матрица throughput, routing и remote Mac 7×24
К 17 июня 2026 ценовая война LLM измеряется не SWE-bench скриншотами, а billing throughput: OpenRouter фиксирует ~10,9T tokens у DeepSeek V4-Flash, V4-Pro навсегда на 25 % launch price, Copilot раздаёт летние credits до 31 августа, Cursor даёт −50 % новым через referral, Windsurf открывает SWE-1.5 на три месяца. Ниже — числа, таблицы latency/$, MoE routing, Batch API и почему agent gateway на удалённом Mac Apple Silicon умножает экономию, которую дают скидки.
1. Почему июнь 2026 — peak $/token efficiency
- DeepSeek MoE portfolio: Flash держит agent loops (~10,9T tokens/мес на OpenRouter), V4-Pro — reasoning steps по ¥0,025 cache hit — ~1/700 vs GPT-5.5 Pro cache input.
- IPO user land grab: OpenAI/Anthropic давят на retention; пауза SDK billing Claude (15.06) = defensive pricing signal.
- Enterprise budget exhaustion: WSJ — Uber-class компании исчерпали AI budget к апрелю; vendors trade margin за volume (−20–30 % usage → price cuts).
| Профиль | Throughput-фокус | Action |
|---|---|---|
| Solo dev | Low QPS, high IDE time | Cursor ref + DeepSeek API |
| Platform team | CI agent bursts | Copilot Enterprise credits до 08.31 |
| Infra engineer | MoE routing + cache | OpenClaw multi-model chain |
| Local inference | Metal t/s fallback | ds4 + remote Mac 128 GB UMA |
2. API layer: permanent cuts и ожидаемые
2.1 DeepSeek V4-Pro — permanent 75 % off (effective 25 % list)
С 31.05.2026 акция «четверть цены» не откатывается. Post-23.05 scaling: 500 concurrent, output speed bump. Agent architecture: Flash на inner loop, Pro на tool-planning — см. OpenRouter weekly routing.
| Metric | Price | Notes |
|---|---|---|
| Input cache hit | ¥0,025 / M | Near-zero marginal для RAG |
| Input cache miss | ¥3 / M | Cold prefix |
| Output | ¥6 / M | Cap max_tokens в prod |
2.2 OpenAI — WSJ cut + GPT-5.6
10.06 WSJ: internal «drastic» API cuts; Altman promises more $/capability. GPT-5.6 EoJune — market $5–8 in / $25–40 out. Сейчас: Batch −50 %, auto Prompt Caching −50–75 %, Nano $0.10/M для classification tier.
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | 128K |
| GPT-5.4 | $2.50 | $15.00 | 1M |
| GPT-4.1 Nano | $0.10 | $0.40 | 1M |
2.3 Gemini 2.5 — 1M context at floor price
| Model | Input | Output | Context |
|---|---|---|---|
| 2.5 Pro | $1.25–2.50 | $10.00 | 1M |
| 2.5 Flash | $0.30 | $2.50 | 1M |
| 2.5 Flash-Lite | $0.10 | $0.40 | 1M |
Google cache discount до 75 % — critical для long-context ingestion pipelines.
2.4 Claude — SDK billing pause 15.06
Planned SDK metered billing отложен — Pro $20 / Max $100–200 сохраняют programmatic quota. IPO pressure → временно status quo; готовьте fallback chain до новой политики.
3. IDE tier: Cursor, Copilot, Windsurf
3.1 Cursor referral — 50 % month 1
Limited rollout (May 2026 confirmed): Pro $20→$10, Pro+ $40→$20, Ultra $200→$100. Parallel agents (до 8), Privacy Mode — throughput IDE-side. Ref URL: cursor.com/signup?ref=….
3.2 GitHub Copilot — summer credit multiplier
| Plan | $/user/mo | Standard credits | Jun–Aug 2026 |
|---|---|---|---|
| Business | $19 | $19 | $30 (+58 %) |
| Enterprise | $39 | $39 | $70 (+79 %) |
Usage billing с 01.06; auto model pick −10 % credits; 1 credit = $0.01.
3.3 Windsurf SWE-1.5 — 3 months free
Near-frontier code model на всех tiers incl. Free (25 Cascade credits/mo). Cascade > Composer в autonomous multi-step; Arena — side-by-side model race. Сравнение: IDE matrix 2026.
4. Cost stack: −80 % без новых подписок
# Tiered routing pseudocode
if task.complexity < THRESHOLD:
route("deepseek-v4-flash") # or gemini-2.5-flash-lite
elif task.needs_reasoning:
route("deepseek-v4-pro") # cache-stable system prefix
else:
route("gpt-5.4") # fallback premium
| Lever | Platform | Savings |
|---|---|---|
| Prompt Caching | Anthropic / OpenAI / Google / DeepSeek | 50–90 % on prefix |
| Batch API | All major vendors | 50 % async 24h |
| Small model routing | Nano / Flash-Lite | 60–75 % on volume |
| Combined (100M tok/mo) | All above | ≈ −80 % |
5. Матрица акций — snapshot 17.06.2026
| Product | Deal | Delta | Deadline | Priority |
|---|---|---|---|---|
| DeepSeek V4-Pro | Permanent quarter price | −75 % | None | P0 deploy now |
| Cursor new user | Referral 50 % M1 | −50 % | Rolling | P1 verify ref |
| Copilot Business | $30 credits | +58 % | 2026-08-31 | P0 team |
| Copilot Enterprise | $70 credits | +79 % | 2026-08-31 | P0 team |
| Windsurf SWE-1.5 | 3 mo free | 100 % model | ~3 mo promo | P2 eval |
| Claude sub | SDK hike paused | Status quo | TBD policy | P1 use quota |
| OpenAI API | Expected cut | TBD | Late Jun/Jul | P2 watch |
| Gemini Flash-Lite | $0.10/M in | Floor | None | P1 long ctx |
6. Пять шагов: от акций к stable gateway
- Prioritize P0 deals: DeepSeek keys + Copilot team credits before Aug 31.
- Implement routing policy: OpenClaw provider chain; log per-model $/request.
- Enable cache + batch: stable system prompt hash; nightly jobs → Batch endpoints.
- Provision remote Mac: M4 Pro 32 GB+ UMA для parallel agents + optional local ds4 fallback on 128 GB node.
- launchd + rsync: gateway never sleeps; workspace sync без paste 200 KB в chat.
openclaw gateway install
openclaw gateway restart
rsync -avz -e "ssh -o ServerAliveInterval=60" ./repo/ user@remote-mac:~/workspaces/repo/
7. FAQ
DeepSeek latency under load? 500 concurrent post-scale; monitor P95 — при 429 failover на Flash tier.
Cursor ref ban risk? Official program — safe; avoid pirated activators.
Copilot credits auto? Yes Jun–Aug Business/Enterprise; Sept baseline.
Claude vs GPT for code throughput? Sonnet 4.x / V4-Pro best $/SWE-step; GPT-5.4 general reasoning.
After Windsurf 3 mo? SWE-1.5 bills normal credits — benchmark now.
8. Итог: скидки — multiplier, не substitute для infra
Июнь 2026 даёт редкий alignment: permanent API cut, summer IDE credits, Claude pause, Cursor halving. Без tiered routing вы сжигаете quota; без always-on host — теряете night-batch throughput когда MacBook уходит в sleep и gateway умирает mid-agent-loop.
Remote Mac Apple Silicon — Metal-native CI, Unified Memory для local fallback (ds4 matrix), launchd для OpenClaw/Cursor CLI/Claude Code. SFTP/rsync держит workspace consistent — меньше tokens на re-upload context.
SFTPMAC аренда удалённого Mac: Apple Silicon 7×24, SFTP/rsync, изоляция прав — чтобы июньские −75 % не упирались в sleep() вашего ноутбука.