Ranked by agents · sourced from Artificial Analysis
Models, ranked by
the agents using them.
Not the marketing pages.
Intelligence · AA Index v4.0 (external, weekly). Cost, latency, reliability · live agent traffic. Every row on chain.
Data partner · Intelligence column sourced from Artificial Analysis (518+ models, refreshed weekly)
methodology →Intelligence · vs · Cost
live · verify dataset →
showprovider
leaderboard · 10 models · click any column to sort
| # | Model⇅ | Lab⇅ | Intelligence▼ | $ / 1M in⇅ | $ / 1M out⇅ | p50 ms⇅ | tok/s⇅ | Reliability⇅ |
|---|---|---|---|---|---|---|---|---|
| 01 | Claude Opus 4.7reasoning· thin data | Anthropic | 73 | $15.0 | $75.0 | 3565 ms | 56 | — |
| 02 | GPT-5.5reasoning· thin data | OpenAI | 70 | $5.0 | $15.0 | 12729 ms | 16 | — |
| 03 | Gemini 3.1 Proreasoning· thin data | 68 | $1.3 | $10.0 | 11348 ms | 18 | — | |
| 04 | Grok 4reasoning· thin data | xAI | 65 | $5.0 | $15.0 | 6793 ms | 29 | — |
| 05 | Mistral Large 3· thin data | Mistral | 60 | $2.0 | $6.0 | 2719 ms | 74 | — |
| 06 | Mistral Medium 3· thin data | Mistral | 54 | $1.0 | $3.0 | — | — | — |
| 07 | Claude Sonnet 4.6· thin data | Anthropic | — | $3.0 | $15.0 | 5591 ms | 36 | — |
| 08 | Claude Haiku 4.5· thin data | Anthropic | — | $1.0 | $5.0 | 3438 ms | 58 | — |
| 09 | GPT-5reasoning· thin data | OpenAI | — | $10.0 | $40.0 | — | — | — |
| 10 | GPT-5 Mini· thin data | OpenAI | — | $2.0 | $8.0 | — | — | — |
threshold for ranked-row eligibility · 100 decisions · refreshed 5/18/2026, 3:22:40 PM
Live · catalog from
/v1/models, leaderboard from /v1/stats/public/leaderboard · verify on chain →methodology
Where the numbers come from.
External benchmark for quality. Live traffic for ops. Proprietary q_empirical captured today, trained at v1.
01 · intelligence · Artificial Analysis Index v4.0 external benchmark
A composite of 10 evaluations — GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. Re-run independently, refreshed weekly. This is what the world rates each model on — distinct from how Ainfera routes.
02 ·
q_prior · Ainfera's routing anchor v0 · liveWhat the router uses as its quality input. At v0,
q_prior anchors directly to the Intelligence column above for the five frontier backends — same signal, different role. As we calibrate to Ainfera's task taxonomy, q_prior will diverge from the public benchmark.03 ·
q_empirical · proprietary residual v1+ · trajectoryThe moat: a learned correction to
q_prior over Ainfera's own routed-outcome records. Zero at launch; compounds with every settled §16 record. The only routing-quality signal that can't be reproduced by copying a spec. Surfaces as decisions_count on each leaderboard row today — that's the training history accumulating.