How routing works

Pick the model by the result, not the reputation.

Every agent call is hard to place — capability, cost and latency trade off differently each time. Ainfera scores the candidates against the task and routes to the one most likely to finish it. Here's what goes into that, and the proof it leaves behind.

01 · Signals we weigh

Four inputs decide whether a call finishes.

These are the signals every candidate is scored on. How we weigh them is the part that compounds with traffic — so the weights stay ours — but the inputs are no secret.

Task type

What the call is

A drafting call and a tool-use call don't want the same model. We read the shape of the request first.

Cost

What it costs

Live per-token price for each candidate, against the ceiling you set.

Latency

How fast it answers

Measured on rolling production traffic, not vendor-published numbers.

Availability

Whether it's healthy now

A provider that's erroring or rate-limiting this minute drops out, and comes back when it recovers.

GPT-OSS 120B (Groq)260 tok/s
GPT-OSS 20B (Novita)255 tok/s
Qwen3.7 Max (Novita)199 tok/s
Qwen3-Next-80B-A3B-Instruct191 tok/s
GLM 5.2176 tok/s
Qwen3.5-35B-A3B163 tok/s
Qwen3.5-122B-A10B143 tok/s
Qwen3.6-35B-A3B140 tok/s
Qwen3-VL-8B-Instruct139 tok/s
GLM-4.7136 tok/s
Qwen3-VL-30B-A3B-Instruct114 tok/s
Qwen3 Coder 30b A3B Instruct108 tok/s

Reference output speed (Artificial Analysis). We score live per-call latency on top of this — measured on production traffic, not published numbers.

Intelligence: Artificial Analysis · artificialanalysis.ai

02 · Outcome

We route to the model most likely to finish the task.

Not the biggest name, not a model you pinned six months ago and forgot. The pick is made per call and changes as price, speed and health change — so the cheapest model that still clears the bar is the one that runs.

Faster isn’t smarter — we pick the point that finishes the task inside your caps.Speed is the Artificial Analysis reference; live per-call latency is scored on top of it.

Intelligence + speed: Artificial Analysis · artificialanalysis.ai

03 · Your controls

You set the box. We pick the model inside it.

Routing is yours to bound. Three controls, settable per agent or per task type.

Caps

Set the box

Per-call cost ceilings and latency targets, per agent or per task type. If nothing fits, we tell you — we never quietly downgrade.

Pins

Force a model

Pin a specific model or provider when you need it, and keep routing everywhere else.

Fallbacks

Stay up

On a 429, 5xx, timeout or refusal we retry the next eligible candidate inside your caps — logged and audited like any other call.

04 · Proof

Every decision is signed, on a public chain.

No black box and no dashboard claim. Every routed call is hashed, Ed25519-signed, and appended to an append-only chain. Verify any one of them with a single keyless request — no account, no key.

Trace · live auditlive

timehash · agentevent · modelseq