Ainfera
Ranked by agents · sourced from Artificial Analysis

Models, ranked by
the agents using them.
Not the marketing pages.

Intelligence · AA Index v4.0 (external, weekly). Cost, latency, reliability · live agent traffic. Every row on chain.

Data partner · Intelligence column sourced from Artificial Analysis (518+ models, refreshed weekly)
methodology →

Intelligence · vs · Cost

showprovider
$0$7$14$21$28$351007550250$ / 1M tokens (in:out 1:3)Intelligence Index
AnthropicOpenAIGooglexAIMistral·pareto frontier★ reasoning model
leaderboard · 10 models · click any column to sort
#ModelLabIntelligence$ / 1M in$ / 1M outp50 mstok/sReliability
01Claude Opus 4.7reasoning· thin dataAnthropic73$15.0$75.03565 ms56
02GPT-5.5reasoning· thin dataOpenAI70$5.0$15.012729 ms16
03Gemini 3.1 Proreasoning· thin dataGoogle68$1.3$10.011348 ms18
04Grok 4reasoning· thin dataxAI65$5.0$15.06793 ms29
05Mistral Large 3· thin dataMistral60$2.0$6.02719 ms74
06Mistral Medium 3· thin dataMistral54$1.0$3.0
07Claude Sonnet 4.6· thin dataAnthropic$3.0$15.05591 ms36
08Claude Haiku 4.5· thin dataAnthropic$1.0$5.03438 ms58
09GPT-5reasoning· thin dataOpenAI$10.0$40.0
10GPT-5 Mini· thin dataOpenAI$2.0$8.0
threshold for ranked-row eligibility · 100 decisions · refreshed 5/18/2026, 3:22:40 PM
Live · catalog from /v1/models, leaderboard from /v1/stats/public/leaderboard · verify on chain →
methodology

Where the numbers come from.

External benchmark for quality. Live traffic for ops. Proprietary q_empirical captured today, trained at v1.

01 · intelligence · Artificial Analysis Index v4.0 external benchmark
A composite of 10 evaluations — GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt. Re-run independently, refreshed weekly. This is what the world rates each model on — distinct from how Ainfera routes.
02 · q_prior · Ainfera's routing anchor v0 · live
What the router uses as its quality input. At v0, q_prior anchors directly to the Intelligence column above for the five frontier backends — same signal, different role. As we calibrate to Ainfera's task taxonomy, q_prior will diverge from the public benchmark.
03 · q_empirical · proprietary residual v1+ · trajectory
The moat: a learned correction to q_prior over Ainfera's own routed-outcome records. Zero at launch; compounds with every settled §16 record. The only routing-quality signal that can't be reproduced by copying a spec. Surfaces as decisions_count on each leaderboard row today — that's the training history accumulating.