Compare · Ainfera · Direct · OpenRouter
Three ways your agent
buys inference today.
Here's how they stack.
Provider APIs, multi-model gateways, Ainfera — all make chat calls. Different jobs underneath.
option 01
Call providers directly.
OpenAI, Anthropic, Google, etc. One SDK per vendor. You pin a model per call, you handle your own budget, you log it yourself.
used by: most prototypes and internal scripts
option 02
Use a multi-model gateway.
OpenRouter and similar. One key, many models, thin markup. Picks cheapest available host for the model you specified — not the right model for the task.
used by: hobby projects, model A/B tests, single-key portability
option 03 · ainfera
Route per task, on the record.
Per-call scoring across model classes within your hard caps, settlement, and public audit chain. Drop-in for OpenAI / Anthropic SDKs.
used by: production agents spending real money unattended
side by side
What each one does — and doesn't.
Honest table. We're not the best fit for every project. We are the right fit when an agent spends real money under no human supervision.
dimension
Direct APIopenai · anthropic · google · …
OpenRoutermulti-model gateway
Ainferarouting · audit · settlement
Picks the model for youper-task scoring vs. you pinning a name
noyou pin a model per call
partialpicks cheapest host, not best model for task
yesscores candidates per task type, returns winner
Hard budget cap per callrefuses to violate the limit
nowhatever the model costs is what you pay
nosoft credit balance · 429 only when empty
yesreturns 409 no_eligible_model
Hard latency capexcludes slow models, no quiet downgrade
—
—
yesp50 cap measured against rolling 24h
Per-task model classese.g. reasoning-frontier · fast-classify
—
—
yeswhen the frontier shifts, your code doesn't
Fallback on provider failure429 / 5xx → next candidate, audited
you write it
cross-host fallback for same model
across model classesretry on next-ranked, within caps
Public verifiable auditanyone can verify any call
noprovider logs only · in their UI
nodashboard logs, account-only
Hashes of decision posted on chainprompt / response / candidate set
—
—
yesblock-confirmed · re-hashable locally
Records you can hand to an auditorno key, no trust required
screenshots only
CSV export
canonical, on-chain, immutable
Markup on inferenceon top of direct provider rates
0%you pay the provider directly
~thinvaries · per provider
8% flatflat · but routes to a cheaper model when the task fits
Net cost on a typical research callbudget cap $0.012 · see worked example below
$0.0357pin claude-opus directly
$0.0064if you pin the same model
$0.0066we pick the right model · save 82% vs direct
Platform / seat / agent feeanything beyond inference cost
none
none
none
Bring your own provider keysavoid pass-through cost entirely
—you ARE the key
yesBYOK mode · markup waived
yesmargin still applies on the routing service
/// integration · dev experience
OpenAI SDK compatibilitydrop in, change one env
native
yes
yes+ Anthropic SDK · + Vercel AI SDK
MCP serverdrop into Claude Desktop / Cursor
no
no
yesmcp.ainfera.ai
Workflow templatesresearch, code-review, classify, …
—
—
Streamingaudit metadata in final chunk
native
yes
yes
/// compliance · operations
SOC 2 Type IIaudited annually
yesmajor providers
in progress
EU / AP data residencytraffic stays in region by workspace policy
per-provider
no
us-east todayeu-west / ap-south on the roadmap
Bug bountypaid disclosure program
major providers
small
responsible disclosureterms → · paid program with GA Status & incident transparencylive system + provider status
per provider
single status
per-system + per-provider90-day strips · live → worked example
One real research call, three bills.
Same prompt, same budget cap. The dollars are direct provider rates as of 2026-05-21.
research-deep-dive · plan step
budget cap $0.0120 · 1,284 in / 412 out · target quality 0.90+
"Compare privacy trade-offs of federated learning vs centralized fine-tuning for medical LLMs. 3-paragraph technical response."
option 01 · direct
You pin claude-opus-4-7
billitemized
tokens in 1,284 @ $14.20/M
tokens out 412 @ $42.60/M
routing markup $0.0000
cached discount —
audit provider log only
-----------------------------
you pay $0.0357
Cheapest in absolute terms only if you happened to pin the right model. You won't, every time.
option 02 · openrouter
You pin claude-opus-4-7 via gateway
billitemized
direct cost $0.0061
gateway markup + $0.0003
cached discount —
audit CSV export
caps soft credits
-----------------------------
you pay $0.0064
~$0.0297 cheaper than direct because they multi-host the same model. Still requires you to pick the model.
option 03 · ainfera
Pass model: "auto"
billitemized
we picked claude-opus-4-7
direct cost $0.0061
ainfera margin 8% + $0.0005
cached discount applied if cache hit
audit on chain
caps enforced · hard
-------------------------------
you pay $0.0066
+ $0.0002 more than OpenRouter for the same model — and you get hard caps + a chain-of-custody record any auditor can verify.
when to choose what
Pick by what you need next quarter.
choose direct if
You're prototyping or internal.
- One agent, one developer, one model
- You're still figuring out the workflow
- Latency & budget aren't life-or-death
- You don't have audit-trail requirements yet
choose openrouter if
You want one key, many models.
- You actively want to A/B specific models per call
- You have your own routing logic already
- You'll bring your own provider keys eventually
- Audit is "nice to have", not "must prove"
choose ainfera if
Your agent spends real money.
- Workflows touch money, PII, or anything subject to scrutiny
- You can't afford to violate a budget or latency cap
- You want the right model picked for each task type automatically
- Your customer might ask you to prove what the agent did, later
migrating
It's a one-line change either way.
/// from openai direct
Already on the OpenAI SDK?
Two lines.
1Change base_url to https://api.ainfera.ai/v1.
2Set model: "auto", or keep your specific model and add a caps object.
3Optional · drop your provider keys into your workspace to pass through at 0% provider markup.
pythonfrom openai direct
from openai import OpenAI
# before
client = OpenAI(api_key=OPENAI_KEY)
# after
client = OpenAI(
base_url="https://api.ainfera.ai/v1",
api_key=AINFERA_KEY,
)
res = client.chat.completions.create(
model="auto",
messages=[...],
extra_body={"caps": {"budget": 0.012}},
)
Total migration time: ~60 seconds. Quickstart →
/// from openrouter
Already on a gateway?
Even simpler.
1Change base_url. Same OpenAI-compatible interface.
2If you were pinning specific models, leave those — caps enforce silently. If you want routing, switch to model: "auto".
3The audit chain turns on automatically. Every call posts a hash. No code changes required.
difffrom openrouter
client = OpenAI(
- base_url="https://openrouter.ai/api/v1",
+ base_url="https://api.ainfera.ai/v1",
api_key=KEY,
)
res = client.chat.completions.create(
- model="anthropic/claude-opus-4-7",
+ model="auto",
messages=[...],
)
Your existing logs keep working in parallel. Docs →
curlverify on chain
$ curl https://audit.ainfera.ai/v1/inf_...
{
"inference_id": "inf_...",
"block": "<block_height>",
"model": "claude-opus-4-7",
"billed": "$0.0066",
"confirmations": 12
}