Why does Anthropic earn 46% of OpenRouter dollars on only 12% of tokens?

Opus 4.6 lists at $5 input and $25 output per million tokens, an order of magnitude above commodity Chinese models. The premium lane and the cheap lane both grew, but Anthropic captured pricing power while losing volume share.

Should I route through OpenRouter or hit official APIs directly from OpenClaw?

OpenRouter is excellent for fast experimentation and automatic provider failover. For high-volume production or compliance-bound traffic, prefer direct API contracts and keep OpenRouter as a secondary fallback inside OpenClaw's fallback chain.

Which model offers the cheapest near-frontier coding agent in May 2026?

DeepSeek V4 Pro scores 80.6% on SWE-bench Verified at $0.435 input and $0.87 output per million tokens, roughly one-thirtieth of Claude Opus 4.7. Pair it with Gemini 3.1 Pro for one-million-token context and multimodal coverage.

OpenRouter Rankings May 2026: Chinese 52% Token vs Anthropic 46% Dollar — Stratified LLM Competition and an OpenClaw Multi-Model Routing Decision Matrix

Three rankings on OpenRouter tell three different stories. Chinese vendors now process 52% of tokens. Anthropic books 46% of dollars on a mere 12% token share. This guide turns those numbers into a concrete multi-model routing matrix for OpenClaw gateways on remote Mac nodes.

1. Three numbers that define May 2026

The first thing to internalise is that OpenRouter publishes three different rankings, and each rewards a different question:

Token leaderboard. Xiaomi MiMo-V2-Pro sits at number one with more than 4.65 trillion weekly tokens. Anthropic's Sonnet 4.6 is second, Alibaba's Qwen 3.6 Plus is third. Volume is the favourite metric of vendors that compete on price.
Dollar leaderboard. Anthropic captures 46.3% of platform revenue. OpenAI follows with 24.2%. The entire Chinese fleet combined collects roughly 13%. Dollars are the favourite metric of vendors that compete on quality.
SWE-bench Verified. GPT-5.5 leads at 88.7%, Claude Opus 4.7 at 87.6%, Gemini 3.1 Pro and DeepSeek V4 Pro both at 80.6%, Kimi K2.6 and MiniMax M2.5 at 80.2%. Code is the favourite metric of teams building agents.

2. Tokens vs dollars: a structural scissor gap

Anthropic's token share fell from 25% a year ago to 12% today, while its dollar share climbed to 46.3%. Google followed a similar path: token share down from 37% to 13%, while absolute revenue still grew. The mechanism is pricing power. Opus 4.6 lists at $5 input and $25 output per million tokens, and earns roughly $22.58 million per month across twenty-two top-twenty applications. Sonnet 4.6 books $19.65 million at $3 input and $15 output, threaded through twenty-three applications. MiMo-V2-Pro, despite handling 5.5 trillion tokens, generates only $7.68 million at a blended $1.50 per million across fifteen applications.

The lesson is to stop reading a single ranking as a verdict. The market is stratifying, not choosing. A premium lane and a commodity lane have separated. Both grew. Different lanes reward different metrics. A pragmatic team uses both rather than picking a winner.

Read carefully, the scissor gap is also a structural warning. The premium lane is highly concentrated and depends on continued willingness to pay frontier prices. The commodity lane is highly fragmented and depends on continued willingness to subsidise inference. If either condition breaks, the relative positions on the leaderboard will shift faster than most architecture diagrams can keep up. Building a routing layer today is therefore not optional optimisation; it is the cheapest insurance policy against the next sudden re-rank.

It is also worth noting the total market grew roughly eleven times in twelve months. In a market that expands that quickly, holding a stable percentage means absolute volume is rising sharply. Anthropic and Google both grew in absolute terms even while losing relative share, which is why their teams continue to ship faster than the public narrative suggests. The leaderboard is a relative scoreboard inside an expanding pie, and that distinction matters when you are sizing capacity, negotiating contracts or choosing a long-term partner.

3. The Chinese victory formula

Chinese-origin models held 15% of the platform a year ago and almost all of that share belonged to DeepSeek. By May 2026 the cluster passed 52%, with five vendors carving distinct lanes:

Xiaomi MiMo-V2-Pro. Aggressive free-tier promotion, raw throughput, $1 input and $3 output. Ideal for retrieval augmented generation, batch document processing and embedding pipelines where output quality is acceptable below frontier.
Alibaba Qwen 3.6 Plus. A hybrid mixture-of-experts architecture that lands in twenty-seven of the top thirty OpenRouter applications. The pragmatic generalist fallback for cost-sensitive production traffic.
DeepSeek V4 Pro. Reasoning specialist scoring 80.6% on SWE-bench Verified for $0.435 input and $0.87 output. Roughly thirty times cheaper than Opus for near-frontier coding throughput.
Moonshot Kimi K2.6. A 128K context model with strong long-horizon agentic behaviour. SWE-bench Verified 80.2% at $0.75 and $3.50 per million. Useful for repository-wide refactors and multi-turn coding loops.
MiniMax M2.5. Multimodal creative output, with $0.30 input and $1.20 output. Excellent for marketing, summary and lightweight vision workloads.

4. SWE-bench Verified: capability divided by output price

The benchmark leaderboard ranks models by accuracy, but a coding agent's bill is dominated by output tokens. The honest comparison divides accuracy by output price per million tokens. GPT-5.5 returns roughly 2.96, Claude Opus 4.7 about 3.50, Gemini 3.1 Pro about 6.72, Kimi K2.6 about 22.9, and DeepSeek V4 Pro about 92.6. The same agent loop completing the same task therefore shifts the monthly invoice by a full order of magnitude depending on which model you pick.

That arithmetic is not an argument to drop frontier models. It is an argument to route them carefully. Use Opus or GPT-5.5 for the planning steps that determine whether the agent will succeed at all, then hand off the long output-heavy phases to a cheaper near-frontier model. Most modern agent frameworks support per-step model selection. OpenClaw exposes this through skill metadata, so the architecture choice can live next to the prompt rather than buried in a separate router.

The harder benchmark, SWE-bench Pro, also re-orders the table. Kimi K2.6 scores 58.6% on SWE-bench Pro, which is higher than GPT-5.4's 57.7%. That kind of inversion is exactly why a portfolio approach is more robust than a single-vendor commitment. Average performance on the easy benchmark does not predict the hard tail, and your production agent will eventually meet the hard tail.

5. Three scenarios by three deployment paths

Scenario	Primary model	Fallback chain	Recommended path
Cost extreme (batch RAG)	DeepSeek V4 Flash $0.14 / $0.28	MiniMax M2.5, MiMo-V2-Pro	OpenRouter direct with auto fallback
Coding extreme (agents)	Claude Opus 4.7 or GPT-5.5	Gemini 3.1 Pro, DeepSeek V4 Pro	Official direct, OpenRouter as safety net
Long context plus multimodal	Gemini 3.1 Pro at 1M context	Claude Sonnet 4.6, Kimi K2.6	Direct Google plus local Ollama fallback
Sensitive or offline	Local Ollama with Qwen or DeepSeek	Compliant official API	Remote Mac 7x24 with gateway allowlist

6. OpenClaw routing in practice

Translate the matrix into a real configuration. Set the primary model under agents.defaults, list a price-ascending fallback chain under fallbacks, and split cliBackends so short interactive calls do not share a queue with long batch jobs. A typical setup:

openclaw config set agents.defaults.model "anthropic/claude-opus-4.7"
openclaw config set agents.defaults.fallbacks \
  "openrouter/gemini-3.1-pro,openrouter/deepseek-v4-pro,openrouter/kimi-k2.6"
openclaw gateway restart
openclaw channels status --probe
openclaw doctor

OpenClaw automatically walks the chain on 429 rate limits, context overflow and provider timeouts. Read the in-depth incident playbook in Channel online but silent (429), the xAI Grok and short-lived token setup in v2026.5.19 deployment guide, and the local Ollama hybrid approach in OpenClaw installation troubleshooting.

A small operational discipline pays for itself within weeks. Log the provider transition on every fallback, then graph the rate of forced retries by hour. A spike usually leads any visible outage by ten or twenty minutes, and that early warning is enough to switch the primary in advance of a customer-visible incident. Pair the graph with a synthetic probe that exercises every model in the chain at low volume so that a silent regression on a backup model does not surface only when the primary is already down.

One more practical note. The output token cap, not the input context window, often becomes the hidden bottleneck of an agent loop. When a fallback model has a smaller cap than the primary, the agent may complete the planning step on the primary, then truncate the final patch on the backup. Add an explicit per-step maxOutputTokens and let the router pick a model that can honour that ceiling.

7. Risks and the remote Mac bridge

Three forward-looking risks should be priced into the fallback chain today:

Free tiers will close. The Xiaomi and Qwen promotions are not permanent. Keep a second Chinese model and a Western anchor in every chain so a single policy change does not strand your agents.
Data sovereignty. Before any user text leaves the box, scope workspaceAccess per business line and prefer per-environment credentials over global keys.
Vendor lock-in. Move API keys into SecretRef, parametrise the model identifier and version, and rehearse a thirty-second switch drill so an outage does not become an incident.

8. Frequently asked questions

Does OpenRouter charge a premium over official APIs? Most models price at parity or within five percent of the official list. The platform earns its margin by removing the cost of running multiple billing accounts and the engineering cost of cross-provider retries.

Can local Ollama replace cloud models for serious work? A well-tuned 32B quantised model still trails frontier cloud models by roughly fifteen to twenty points on SWE-bench Verified. The tradeoff is acceptable for internal tools, offline batch jobs and sensitive data, but pure cloud remains faster on the hardest tasks.

Does OpenClaw support automatic provider failover natively? Yes. From version 2026.4 onward the gateway retries down the fallbacks chain and the gateway log records the exact provider transition, which simplifies postmortems.

How often should I review the routing matrix? A monthly review aligned with each vendor's pricing notice and quarterly benchmark refresh is enough for most teams. Promote a fallback to primary only when three consecutive weeks of synthetic probes show parity or better, and only after the workload has been replayed on a staging environment with realistic latency and cost telemetry.

9. Closing: stop picking a winner, start running a stratified portfolio

The real signal from OpenRouter's May 2026 board is not that China won or that Anthropic lost. The signal is that the large language model market has split into two coexisting lanes. A premium lane keeps paying for frontier quality. A commodity lane keeps absorbing volume at near-zero margin. Any serious team needs a portfolio strategy, not a single bet, and OpenClaw's multi-provider routing turns that strategy from a slide deck into a config file.

A routing matrix, however, only solves the software side of the problem. It cannot keep a laptop awake when the lid closes, prevent a Windows host from sleeping, or rescue a low-memory VPS from being killed by the kernel just as the fallback chain reaches its safest entry. The hardware layer matters because the most carefully designed retry only helps if the gateway is alive at the moment of the retry. Hosting the gateway, credentials, workspace and SFTP synchronisation baseline on a power-stable, network-stable macOS node is what turns a stratified routing plan into stratified availability. SFTPMAC remote Mac rentals deliver Apple Silicon nodes tuned for OpenClaw and OpenRouter: native launchd supervision, low-latency channel callbacks, and an operational baseline that links cleanly to the xAI Grok integration, the 429 incident playbook and the gateway restart guides referenced above. Treat them as the production substrate that lets your portfolio strategy survive its first real outage.