Ainfera
inference routing for AI agents · live

Never pick a model again.

Your agents call one endpoint. Ainfera routes every request to the best model under your caps — and signs it.

Start building →for agents → /llms.txt
/// routing pipelinereal decisions · cycles every 3.2s · live from /v1/audit/public
your agent
tulkas
one endpoint · agent doesn't pick
ainfera picks
gemini-3-1-pro← chosen
gpt-5-5
claude-opus-4-7
claude-sonnet-4-6
cost · — (insufficient traffic)
signed
#1,400
0xcca3…376d
Ed25519 · gemini
/// routing across·10active models·5of11brands routable today·live
routed independently per call·live catalog at /v1/catalog/brands·leaderboard →
Models · live catalog
10
Audit chain · cumulative
#4,953
Audit signature
Ed25519
Routing · status
active
/// how it routes

Filter. Score. Pick.

The cheapest model that clears your caps. Three steps. Sub-30ms.

01
Filter
M_allowed = models that clear your hard caps (budget, latency) and pass the compliance veto.
02
Score
q_prior (public benchmark) ⊕ q_empirical (Ainfera's own routed-outcome residual — compounds with traffic).
03
Pick
Cheapest model that clears the floor. Tie → higher q_prior → lower latency tier.

read the methodology →

/// the moat · q_empirical

Every call makes the next one smarter.

Settled call → labeled outcome → smarter routing. The loop compounds with traffic. Zero at launch; the only term in Q(m, x, a) that can't be copied from a spec.

settled calllabeled outcomesmarter routing
/// live · signed decisions

Every decision, signed. Verify any one.

Live tail of /v1/audit/public — server-side scrubbed of internal tenants, hash-chained, keyless to read.

timehash · agentevent · modelseq
19:43:54
0x7396…d798 · tulkas
refunded
1,403
19:43:54
0x5f08…47b6 · tulkas
created
1,405
19:43:54
0x4ee3…0022 · tulkas
provider ok · gemini
1,404
19:43:50
0xcca3…376d · tulkas
request · gemini-3-1-pro
1,400
19:43:50
0x2554…0a95 · tulkas
debited
1,401
19:43:50
0xcad6…d1a0 · tulkas
routed · gemini-3-1-pro
1,402
19:43:49
0x25a8…49df · tulkas
routed
1,399
19:43:48
0x8b28…2c4c · tulkas
provider err · mistral
1,398
19:43:48
0xf9ff…5ec5 · tulkas
refunded
1,397
19:43:43
0x3407…4f19 · tulkas
debited
1,395
/// keep your SDK

Two lines to switch.

Point your existing OpenAI / Anthropic client at our base URL. Set the model slug. You're routed.

python · openai SDK
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ainfera.ai/v1",
    api_key=os.environ["AINFERA_KEY"],
)

res = client.chat.completions.create(
    model="ainfera-inference",
    messages=[{"role": "user", "content": query}],
    extra_body={"caps": {"budget": 0.012, "latency_ms": 1500}},
)

print(res.choices[0].message.content)
print("verify:", res.audit.id)

60-second quickstart → · full docs →

/// machine-readable surface

Your agent can read this site.

One canonical index for agents; one OpenAPI for SDKs; one MCP server for tools; one keyless audit feed for verifiers.

llms.txtcanonical agent indexopenapi.jsonfull REST spec ↗mcp.ainfera.aiMCP server for tools ↗/v1/audit/publicno key required ↗
/// shipping next

Settlement (provider-payout) and workflow orchestration.

Today: routing + signed audit, live. Next: wallet-debit settlement to providers once SG entity + MAS PSA land, plus multi-step workflow orchestration as a first-class primitive.

Start building →read routing methodology →