OpenRouter Weekly Token Rankings May 18-24 2026: Billing Data Doesn't Lie — Agent Routing and Remote Mac Guide
During the week of May 18 through May 24, 2026, OpenRouter processed 28.9 trillion tokens globally, up 7.4% for a fifth consecutive weekly gain. DeepSeek V4 Flash led every model at 3.43T (+66%), Chinese-origin models totaled 9.223T for a fourth straight week above the United States at 4.93T, and Anthropic booked roughly 46% of platform revenue on just 12% of tokens. This guide treats that weekly board as billing truth, not marketing theatre, and translates it into OpenClaw routing steps plus a remote Mac deployment matrix.
1. Why billing data is more honest than benchmarks
OpenRouter routes more than 300 models and handles on the order of 100 trillion tokens per month across free and paid tiers. Its public rankings sort models by actual weekly token throughput — the same meter your API key hits when an agent retries a failed tool call at 2 a.m. Benchmarks measure what a model can do in a controlled harness. Billing data measures what thousands of teams chose to pay for, or route for free, when nobody is watching a leaderboard screenshot.
The scale shift is easy to miss without a weekly lens. Roughly one year earlier, OpenRouter processed about 2.4 trillion tokens in a comparable week. The May 18-24 window at 28.9T is roughly a 12x increase. That growth did not come from chat demos. It came from agent loops, batch codegen, retrieval-heavy pipelines, and channel bots that never close the session. When you read "number one on the board," you are reading production economics, not a vendor's cherry-picked eval card.
That is why we frame this article around a single sentence: billing data doesn't lie. It may lag a stealth launch by a few days. It may overweight free promotions. It still beats guessing from SWE-bench alone when your OpenClaw gateway fires fifty parallel completions overnight.
2. Data source and counting rules
Primary source: openrouter.ai/rankings. OpenRouter publishes weekly token totals, per-model ranks, vendor share, and a split between token share and USD revenue share. Figures in this article reflect the board as of 2026-05-24. Rankings move daily; always confirm the live page before you change production defaults.
Three counting rules matter for architects. First, tokens are prompts plus completions aggregated across every provider path on OpenRouter, including free tiers that still consume router capacity. Second, vendor share rolls up model families, which is why DeepSeek appears multiple times in the Top 10 yet also reports a combined vendor total. Third, regional share is model-origin attribution, not end-user geography — a US team routing Hy3 still increments China's model-origin bucket.
For portfolio routing theory — failover chains, SecretRef hygiene, and the May stratification frame — cross-read our May 2026 stratified LLM competition guide. This weekly piece stays anchored on the May 18-24 numbers and the billing-versus-benchmark tension.
3. 28.9T weekly volume and the China-US split
Global throughput rose 7.4% week over week, continuing a five-week climb. The regional story is sharper: Chinese-origin models processed 9.223T tokens (+19.89%) while US-origin models processed 4.93T (+16.27%). China has led the US for four consecutive weeks on this board — not a single promotional spike, but a sustained volume migration toward low-unit-cost open and semi-open weights.
| Metric | Value | Week-over-week |
|---|---|---|
| Global weekly tokens | 28.9 trillion | +7.4% (fifth weekly gain) |
| China-origin model tokens | 9.223 trillion | +19.89% |
| US-origin model tokens | 4.93 trillion | +16.27% |
| China vs US streak | China ahead | Fourth consecutive week at number one |
Chinese models accounted for under 2% of OpenRouter volume in early 2025 and roughly 45%+ by May 2026 — developers voting with API keys, not press releases. US volume still grew (+16.27%) in absolute terms; Anthropic and Google remain essential for premium reasoning. The board shows both lanes expanding while token share and dollar share diverge.
4. Week 4 Top 10 and the DeepSeek portfolio
Model-level ranks for May 18-24 tell a clearer story than vendor slogans. Flash beat Pro on volume. Hy3 Preview held second place even as promotional pricing shifted. Claude Sonnet 4.6 remained the highest-ranked US daily driver. Owl Alpha proved that a zero-list-price agent-specialized model can sit in the top five without a benchmark tour.
| Rank | Model | Vendor | Weekly tokens | Notes |
|---|---|---|---|---|
| 1 | DeepSeek-V4-Flash | DeepSeek (China) | 3.43T (+66%) | Default for agent workflows; aggressive API pricing |
| 2 | Tencent Hy3 Preview | Tencent (China) | 3.07T (+16%) | Strong growth after preview pricing changes |
| 3 | Claude Sonnet 4.6 | Anthropic (US) | 1.35T | Million-token context; enterprise coding workhorse |
| 4 | DeepSeek-V3.2 | DeepSeek (China) | 1.31T | Legacy price-performance; roleplay and long-tail traffic |
| 5 | Owl Alpha | OpenRouter | 1.15T (+29%) | Free agent-focused model; million-class context |
| 6–10 | Gemini 3 Flash / V4-Pro / MiniMax M2.7 / Grok 4.1 Fast / Step 3.5 Flash | Google / DeepSeek / MiniMax / xAI / StepFun | 673B–1.06T each | Multimodal, flagship reasoning, long context, legal, batch |
DeepSeek's portfolio effect is the headline within the headline. Three DeepSeek SKUs placed inside the top nine during this week. Combined DeepSeek-family volume reached approximately 5.74T tokens, up 25.9% week over week, marking a second consecutive week as the top vendor by token count. V4-Pro's permanent price cut to roughly one-quarter of its launch list price reinforced the matrix strategy: Flash owns loops, V3.2 catches price-sensitive legacy routes, V4-Pro handles steps that need heavier reasoning without Opus list pricing.
Watch the churn at the boundary. Kimi K2.6 dropped out of the Top 10 this particular week — proof that weekly boards punish resting on last month's narrative. Treat rank six through ten as a volatility band where multimodal and domain-specific models swap places as promotions rotate. Your OpenClaw defaults should not rewrite every Friday; your grey-traffic watchlist should.
5. Three model-selection pain points
Teams that copy rank order into production without a scenario map still hit the same three cliffs every quarter.
- Benchmark anchoring on Opus alone. SWE-bench Verified scores justify a premium fallback, not a default for every cron-driven agent. A million-token daily batch at Opus pricing converts a leaderboard win into a finance ticket. Let billing data assign Flash or Hy3 to loops and reserve Opus for steps that measurably reduce error recovery time.
- Free-model innocence. Owl Alpha and other stealth or platform-hosted free models are excellent sandboxes. They may log prompts or change behaviour without semver notice. Never route credentials, customer PII, or unreleased source through a free primary without a contractual data-processing review.
- Model solved, gateway unstable. OpenClaw channels look "online" while the host sleeps. Users experience that as model stupidity. Weekly rankings cannot deliver ROI if
gateway restartis your most-used command. Fix uptime before you debate rank three versus rank four.
6. Token volume versus dollar revenue: the dual ledger
OpenRouter exposes two leaderboards inside one product: who moved the most tokens, and who collected the most dollars. Those leaderboards disagree on purpose. The gap is not a bug — it is the market structure for AI inference in mid-2026.
| Lane | Representative models | Token behaviour | Revenue behaviour |
|---|---|---|---|
| High value, lower volume | Claude Opus | ~12% token share (down from ~25% a year prior) | ~46% platform revenue share |
| Mid price, steady utility | Gemini Flash, Sonnet | Stable multimodal and coding share | Mid-tier dollar contribution |
| Ultra-low price, massive volume | DeepSeek, MiniMax, StepFun | Dominates agent, coding, and batch boards | High tokens, lower revenue percentage |
Call it the Anthropic premium paradox if you need a memo-friendly label. Opus-class models can generate on the order of $25 million per month in platform revenue while moving a fraction of DeepSeek's token mass. Nobody is confused in accounting; they are optimising different objectives. Enterprises buy certainty on hard reasoning steps. Startups and agent frameworks buy throughput. OpenClaw deployments that ignore the dual ledger either overspend on defaults or under-provision quality on customer-visible failures.
7. When benchmark scores invert market share
OpenRouter's own research collaborations and the a16z 2025 AI Usage Report both note a recurring pattern: benchmark leadership and market share often invert. A model that tops a static eval does not automatically top a weekly billing board. Developers shifted spend toward "good enough" coding models with stable tool APIs and aggressive caching — not toward the highest MMLU score available that Tuesday.
Coding-shaped workloads illustrate the shift. Estimates from usage reports put coding-related API traffic at roughly 11% of volume in early 2024 and above 50% by 2026 on major routers. That single category reweights the entire Top 10 toward models that tolerate long contexts, fast cache hits, and repeated JSON tool payloads. Benchmarks still set floors — you should not deploy a model that fails your minimal harness — but billing sets the default.
8. Five steps: weekly board tracking plus OpenClaw routing
Rankings become infrastructure when they live in version-controlled config next to your skills. Execute this sequence on a staging gateway before you repoint production channels.
- Review the board every Monday. Open
openrouter.ai/rankingsand record Top 10 models, vendor share, and any new entrant within ranks six through ten. Hy3 Preview and Owl Alpha both entered that band before broader press coverage — the board leads narratives by one to three weeks. - Layer tasks before you layer models. Agent batch and inner-loop retries default to DeepSeek V4 Flash. Enterprise complex reasoning and regulated drafting keep Claude Opus or Sonnet as primary or first fallback. Multimodal ingestion steps route through Gemini 3 Flash unless OCR quality tests fail.
- Write
openclaw.jsonwith SecretRef. OpenRouter model IDs require vendor prefixes such asdeepseek/deepseek-v4-flashandanthropic/claude-sonnet-4.6. Store API keys in SecretRef or your vault integration; never commit literals to git. Split CLI backends if interactive chat and batch jobs should not share rate-limit buckets. - Install an always-on gateway on macOS. Run
openclaw gateway installunder launchd on a host that does not sleep. Pair with our gateway restart and launchd guide so upgrades recycle cleanly without orphan processes. - Grey-traffic, probe, and fall back. Pass
openclaw doctor, thenopenclaw channels status --probe, then route ten percent of production-shaped traffic through the candidate primary. On HTTP 429 or provider timeout, confirm OpenClaw walks the configured fallback chain automatically before you promote a new default.
openclaw doctor
openclaw channels status --probe
openclaw config get agents.defaults.model
openclaw config get agents.defaults.fallbacks
Log provider transitions per skill. Spikes in forced fallback often precede public status-page incidents. For silent-channel triage, use the layered playbook in our channel online but no response guide. Routing and uptime debugging are coupled problems; solve them in that order.
9. Remote Mac 7x24 decision matrix
May 2026 models are inexpensive enough to run continuously. Most gateway hosts are not. Pick the substrate before you re-litigate Flash versus Sonnet for the third time this month.
| Deployment location | Best for | Primary risk |
|---|---|---|
| Local laptop | Reading the weekly board; one-off debugging | Sleep breaks gateway TCP; cannot sustain weekly agent cadence |
| Small Linux VPS | Stateless API relay without Apple toolchain | RAM pressure under parallel agents; no Xcode or notarisation path |
| SFTPMAC remote Mac | Production OpenClaw plus build artifacts on one node | Requires directory permission planning — mitigated with SFTP/rsync baselines on this blog |
A remote Mac wins on launchd supervision, native macOS paths for workspace and Keychain integration, and SFTP-friendly sync for routing changes. For June cumulative totals and six-scenario matrices, see our June 2026 OpenRouter Top 10 trends guide.
10. FAQ
Should I trust OpenRouter weekly rankings or SWE-bench more? Rankings show what teams actually run and pay for. SWE-bench shows coding ceilings. Use both; let billing data drive default routes and benchmarks set minimum quality floors.
Why does Anthropic earn 46% revenue on 12% tokens? Premium list pricing on Opus and Sonnet. Enterprises pay for hard reasoning. DeepSeek trades ultra-low unit cost for volume — the scissor gap is intentional market segmentation.
How often should I change OpenClaw routing after weekly moves? Review weekly; change primaries only after grey-traffic probes succeed. Watch ranks six through ten for candidates, not just rank one.
How is this article different from the June Top 10 piece? This post locks to May 18-24 weekly totals and the billing-doesn't-lie frame with the Anthropic paradox. June tracks later cumulative leaders and six structural trends.
Did DeepSeek price cuts affect this board? V4-Pro's permanent reduction to roughly one-quarter of its original list price reinforced DeepSeek's multi-SKU portfolio. Expect Flash to keep leading agent volume unless a rival undercuts on cache-aware economics.
11. Summary: vote with billing data, deliver on an always-on gateway
The May 18-24 board is unambiguous. Global volume hit 28.9T tokens with China at 9.223T and the US at 4.93T. DeepSeek V4 Flash led at 3.43T; the DeepSeek family totaled about 5.74T. Anthropic still captures roughly 46% of revenue on 12% of tokens. Chinese open weights won the throughput war; US premium models won the margin war. Agent and coding workloads are the battlefield for both.
Interpreting the board is only half the job. OpenClaw must run primary and fallback chains on a gateway that survives overnight tool loops, channel probes, and Monday-morning config promotions. Laptops sleep. Small VPS instances exhaust RAM during parallel fallbacks. Intermittent hosts turn accurate routing into "it worked last week" reports that waste a day of triage even when OpenRouter itself is healthy.
Once you tag skills, write SecretRef-backed config, and schedule weekly board reviews, migrate gateway and workspace state to a remote Mac with SFTP or rsync rollback baselines. SFTPMAC remote Mac rental provides Apple Silicon 7x24 hosts aligned with the OpenClaw gateway install, channel probe, and stratified routing guides on this blog — infrastructure that lets May's billing data become production ROI instead of a screenshot on a machine you close every evening.