Ainfera
Compare · Ainfera · Direct · OpenRouter

Three ways your agent
buys inference today.
Here's how they stack.

Provider APIs, multi-model gateways, Ainfera — all make chat calls. Different jobs underneath.

option 01

Call providers directly.

OpenAI, Anthropic, Google, etc. One SDK per vendor. You pin a model per call, you handle your own budget, you log it yourself.

used by: most prototypes and internal scripts
option 02

Use a multi-model gateway.

OpenRouter and similar. One key, many models, thin markup. Picks cheapest available host for the model you specified — not the right model for the task.

used by: hobby projects, model A/B tests, single-key portability
option 03 · ainfera

Route per task, on the record.

Per-call scoring across model classes within your hard caps, settlement, and public audit chain. Drop-in for OpenAI / Anthropic SDKs.

used by: production agents spending real money unattended
side by side

What each one does — and doesn't.

Honest table. We're not the best fit for every project. We are the right fit when an agent spends real money under no human supervision.

dimension
Direct APIopenai · anthropic · google · …
OpenRoutermulti-model gateway
Ainferarouting · audit · settlement
/// routing intelligence
Picks the model for youper-task scoring vs. you pinning a name
noyou pin a model per call
partialpicks cheapest host, not best model for task
yesscores candidates per task type, returns winner
Hard budget cap per callrefuses to violate the limit
nowhatever the model costs is what you pay
nosoft credit balance · 429 only when empty
yesreturns 409 no_eligible_model
Hard latency capexcludes slow models, no quiet downgrade
yesp50 cap measured against rolling 24h
Per-task model classese.g. reasoning-frontier · fast-classify
yeswhen the frontier shifts, your code doesn't
Fallback on provider failure429 / 5xx → next candidate, audited
you write it
cross-host fallback for same model
across model classesretry on next-ranked, within caps
/// audit · proof
Public verifiable auditanyone can verify any call
noprovider logs only · in their UI
nodashboard logs, account-only
Hashes of decision posted on chainprompt / response / candidate set
yesblock-confirmed · re-hashable locally
Records you can hand to an auditorno key, no trust required
screenshots only
CSV export
canonical, on-chain, immutable
/// cost · pricing
Markup on inferenceon top of direct provider rates
0%you pay the provider directly
~thinvaries · per provider
8% flatflat · but routes to a cheaper model when the task fits
Net cost on a typical research callbudget cap $0.012 · see worked example below
$0.0357pin claude-opus directly
$0.0064if you pin the same model
$0.0066we pick the right model · save 82% vs direct
Platform / seat / agent feeanything beyond inference cost
none
none
none
Bring your own provider keysavoid pass-through cost entirely
you ARE the key
yesBYOK mode · markup waived
yesmargin still applies on the routing service
/// integration · dev experience
OpenAI SDK compatibilitydrop in, change one env
native
yes
yes+ Anthropic SDK · + Vercel AI SDK
MCP serverdrop into Claude Desktop / Cursor
no
no
yesmcp.ainfera.ai
Workflow templatesresearch, code-review, classify, …
starter templatesgallery →
Streamingaudit metadata in final chunk
native
yes
yes
/// compliance · operations
SOC 2 Type IIaudited annually
yesmajor providers
in progress
in progressposture →
EU / AP data residencytraffic stays in region by workspace policy
per-provider
no
us-east todayeu-west / ap-south on the roadmap
Bug bountypaid disclosure program
major providers
small
responsible disclosureterms → · paid program with GA
Status & incident transparencylive system + provider status
per provider
single status
per-system + per-provider90-day strips · live →
worked example

One real research call, three bills.

Same prompt, same budget cap. The dollars are direct provider rates as of 2026-05-21.

research-deep-dive · plan step
budget cap $0.0120 · 1,284 in / 412 out · target quality 0.90+
"Compare privacy trade-offs of federated learning vs centralized fine-tuning for medical LLMs. 3-paragraph technical response."
option 01 · direct

You pin claude-opus-4-7

billitemized
tokens in        1,284 @ $14.20/M
tokens out       412 @ $42.60/M
routing markup   $0.0000
cached discount  —
audit            provider log only
-----------------------------
you pay          $0.0357
Cheapest in absolute terms only if you happened to pin the right model. You won't, every time.
option 02 · openrouter

You pin claude-opus-4-7 via gateway

billitemized
direct cost      $0.0061
gateway markup   + $0.0003
cached discount  —
audit            CSV export
caps             soft credits
-----------------------------
you pay          $0.0064
~$0.0297 cheaper than direct because they multi-host the same model. Still requires you to pick the model.
option 03 · ainfera

Pass model: "auto"

billitemized
we picked          claude-opus-4-7
direct cost        $0.0061
ainfera margin 8%  + $0.0005
cached discount    applied if cache hit
audit              on chain
caps               enforced · hard
-------------------------------
you pay            $0.0066
+ $0.0002 more than OpenRouter for the same model — and you get hard caps + a chain-of-custody record any auditor can verify.
when to choose what

Pick by what you need next quarter.

choose direct if

You're prototyping or internal.

  • One agent, one developer, one model
  • You're still figuring out the workflow
  • Latency & budget aren't life-or-death
  • You don't have audit-trail requirements yet
choose openrouter if

You want one key, many models.

  • You actively want to A/B specific models per call
  • You have your own routing logic already
  • You'll bring your own provider keys eventually
  • Audit is "nice to have", not "must prove"
choose ainfera if

Your agent spends real money.

  • Workflows touch money, PII, or anything subject to scrutiny
  • You can't afford to violate a budget or latency cap
  • You want the right model picked for each task type automatically
  • Your customer might ask you to prove what the agent did, later
migrating

It's a one-line change either way.

/// from openai direct

Already on the OpenAI SDK?
Two lines.

1Change base_url to https://api.ainfera.ai/v1.
2Set model: "auto", or keep your specific model and add a caps object.
3Optional · drop your provider keys into your workspace to pass through at 0% provider markup.
pythonfrom openai direct
from openai import OpenAI

# before
client = OpenAI(api_key=OPENAI_KEY)

# after
client = OpenAI(
  base_url="https://api.ainfera.ai/v1",
  api_key=AINFERA_KEY,
)

res = client.chat.completions.create(
  model="auto",
  messages=[...],
  extra_body={"caps": {"budget": 0.012}},
)

Total migration time: ~60 seconds. Quickstart →

/// from openrouter

Already on a gateway?
Even simpler.

1Change base_url. Same OpenAI-compatible interface.
2If you were pinning specific models, leave those — caps enforce silently. If you want routing, switch to model: "auto".
3The audit chain turns on automatically. Every call posts a hash. No code changes required.
difffrom openrouter
 client = OpenAI(
-  base_url="https://openrouter.ai/api/v1",
+  base_url="https://api.ainfera.ai/v1",
   api_key=KEY,
 )

 res = client.chat.completions.create(
-  model="anthropic/claude-opus-4-7",
+  model="auto",
   messages=[...],
 )

Your existing logs keep working in parallel. Docs →

curlverify on chain
$ curl https://audit.ainfera.ai/v1/inf_...
{
  "inference_id": "inf_...",
  "block": "<block_height>",
  "model": "claude-opus-4-7",
  "billed": "$0.0066",
  "confirmations": 12
}
next

Try it. The first call lands on chain in 60 seconds.

Read the 60-second quickstart →