Your agents call one endpoint. Ainfera routes every request to the best model under your caps — and signs it.
/v1/audit/publicThe cheapest model that clears your caps. Three steps. Sub-30ms.
M_allowed = models that clear your hard caps (budget, latency) and pass the compliance veto.q_prior (public benchmark) ⊕ q_empirical (Ainfera's own routed-outcome residual — compounds with traffic).q_prior → lower latency tier.Settled call → labeled outcome → smarter routing. The loop compounds with traffic. Zero at launch; the only term in Q(m, x, a) that can't be copied from a spec.
Live tail of /v1/audit/public — server-side scrubbed of internal tenants, hash-chained, keyless to read.
Point your existing OpenAI / Anthropic client at our base URL. Set the model slug. You're routed.
from openai import OpenAI
client = OpenAI(
base_url="https://api.ainfera.ai/v1",
api_key=os.environ["AINFERA_KEY"],
)
res = client.chat.completions.create(
model="ainfera-inference",
messages=[{"role": "user", "content": query}],
extra_body={"caps": {"budget": 0.012, "latency_ms": 1500}},
)
print(res.choices[0].message.content)
print("verify:", res.audit.id)One canonical index for agents; one OpenAPI for SDKs; one MCP server for tools; one keyless audit feed for verifiers.
Today: routing + signed audit, live. Next: wallet-debit settlement to providers once SG entity + MAS PSA land, plus multi-step workflow orchestration as a first-class primitive.