Compare · Ainfera · Direct · OpenRouter

Three ways your agent
buys inference today.
Here's how they stack.

Provider APIs, multi-model gateways, Ainfera — all make chat calls. Different jobs underneath.

option 01

Call providers directly.

OpenAI, Anthropic, Google, etc. One SDK per vendor. You pin a model per call, you handle your own budget, you log it yourself.

used by: most prototypes and internal scripts

option 02

Use a multi-model gateway.

OpenRouter and similar. One key, many models, thin markup. Picks cheapest available host for the model you specified — not the right model for the task.

used by: hobby projects, model A/B tests, single-key portability

option 03 · ainfera

Route per task, on the record.

Per-call scoring across model classes within your hard caps, settlement, and public audit chain. Drop-in for OpenAI / Anthropic SDKs.

used by: production agents spending real money unattended

side by side

What each one does — and doesn't.

Honest table. We're not the best fit for every project. We are the right fit when an agent spends real money under no human supervision.

dimension

Direct APIopenai · anthropic · google · …

OpenRoutermulti-model gateway

Ainferarouting · audit · settlement

/// routing intelligence

Picks the model for youper-task scoring vs. you pinning a name

noyou pin a model per call

partialpicks cheapest host, not best model for task

yesscores candidates per task type, returns winner

Hard budget cap per callrefuses to violate the limit

nowhatever the model costs is what you pay

nosoft credit balance · 429 only when empty

yesreturns 409 no_eligible_model

Hard latency capexcludes slow models, no quiet downgrade

—

yesp50 cap measured against rolling 24h

Per-task model classese.g. reasoning-frontier · fast-classify

—

yeswhen the frontier shifts, your code doesn't

Fallback on provider failure429 / 5xx → next candidate, audited

you write it

cross-host fallback for same model

across model classesretry on next-ranked, within caps

/// audit · proof

Public verifiable auditanyone can verify any call

noprovider logs only · in their UI

nodashboard logs, account-only

on chainverify any inference id →

Hashes of decision posted on chainprompt / response / candidate set

—

yesblock-confirmed · re-hashable locally

Records you can hand to an auditorno key, no trust required

screenshots only

CSV export

canonical, on-chain, immutable

/// cost · pricing

Markup on inferenceon top of direct provider rates

0%you pay the provider directly

~thinvaries · per provider

8% flatflat · but routes to a cheaper model when the task fits

Net cost on a typical research callbudget cap $0.012 · see worked example below

$0.0357pin claude-opus directly

$0.0064if you pin the same model

$0.0066we pick the right model · save 82% vs direct

Platform / seat / agent feeanything beyond inference cost

none

Bring your own provider keysavoid pass-through cost entirely

—you ARE the key

yesBYOK mode · markup waived

yesmargin still applies on the routing service

/// integration · dev experience

OpenAI SDK compatibilitydrop in, change one env

native

yes

yes+ Anthropic SDK · + Vercel AI SDK

MCP serverdrop into Claude Desktop / Cursor

yesmcp.ainfera.ai

Workflow templatesresearch, code-review, classify, …

—

starter templatesgallery →

Streamingaudit metadata in final chunk

native

yes

/// compliance · operations

SOC 2 Type IIaudited annually

yesmajor providers

in progress

in progressposture →

EU / AP data residencytraffic stays in region by workspace policy

per-provider

us-east todayeu-west / ap-south on the roadmap

Bug bountypaid disclosure program

major providers

small

responsible disclosureterms → · paid program with GA

Status & incident transparencylive system + provider status

per provider

single status

per-system + per-provider90-day strips · live →

worked example

One real research call, three bills.

Same prompt, same budget cap. The dollars are direct provider rates as of 2026-05-21.

research-deep-dive · plan step

budget cap $0.0120 · 1,284 in / 412 out · target quality 0.90+

"Compare privacy trade-offs of federated learning vs centralized fine-tuning for medical LLMs. 3-paragraph technical response."

option 01 · direct

You pin claude-opus-4-7

billitemized

tokens in        1,284 @ $14.20/M
tokens out       412 @ $42.60/M
routing markup   $0.0000
cached discount  —
audit            provider log only
-----------------------------
you pay          $0.0357

Cheapest in absolute terms only if you happened to pin the right model. You won't, every time.

option 02 · openrouter

You pin claude-opus-4-7 via gateway

billitemized

direct cost      $0.0061
gateway markup   + $0.0003
cached discount  —
audit            CSV export
caps             soft credits
-----------------------------
you pay          $0.0064

~$0.0297 cheaper than direct because they multi-host the same model. Still requires you to pick the model.

option 03 · ainfera

Pass model: "auto"

billitemized

we picked          claude-opus-4-7
direct cost        $0.0061
ainfera margin 8%  + $0.0005
cached discount    applied if cache hit
audit              on chain
caps               enforced · hard
-------------------------------
you pay            $0.0066

+ $0.0002 more than OpenRouter for the same model — and you get hard caps + a chain-of-custody record any auditor can verify.

when to choose what

Pick by what you need next quarter.

choose direct if

You're prototyping or internal.

One agent, one developer, one model
You're still figuring out the workflow
Latency & budget aren't life-or-death
You don't have audit-trail requirements yet

choose openrouter if

You want one key, many models.

You actively want to A/B specific models per call
You have your own routing logic already
You'll bring your own provider keys eventually
Audit is "nice to have", not "must prove"

choose ainfera if

Your agent spends real money.

Workflows touch money, PII, or anything subject to scrutiny
You can't afford to violate a budget or latency cap
You want the right model picked for each task type automatically
Your customer might ask you to prove what the agent did, later

migrating

It's a one-line change either way.

/// from openai direct

Already on the OpenAI SDK?
Two lines.

1Change base_url to https://api.ainfera.ai/v1.

2Set model: "auto", or keep your specific model and add a caps object.

3Optional · drop your provider keys into your workspace to pass through at 0% provider markup.

pythonfrom openai direct

from openai import OpenAI

# before
client = OpenAI(api_key=OPENAI_KEY)

# after
client = OpenAI(
  base_url="https://api.ainfera.ai/v1",
  api_key=AINFERA_KEY,
)

res = client.chat.completions.create(
  model="auto",
  messages=[...],
  extra_body={"caps": {"budget": 0.012}},
)

Total migration time: ~60 seconds. Quickstart →

/// from openrouter

Already on a gateway?
Even simpler.

1Change base_url. Same OpenAI-compatible interface.

2If you were pinning specific models, leave those — caps enforce silently. If you want routing, switch to model: "auto".

3The audit chain turns on automatically. Every call posts a hash. No code changes required.

difffrom openrouter

 client = OpenAI(
-  base_url="https://openrouter.ai/api/v1",
+  base_url="https://api.ainfera.ai/v1",
   api_key=KEY,
 )

 res = client.chat.completions.create(
-  model="anthropic/claude-opus-4-7",
+  model="auto",
   messages=[...],
 )

Your existing logs keep working in parallel. Docs →

curlverify on chain

$ curl https://audit.ainfera.ai/v1/inf_...
{
  "inference_id": "inf_...",
  "block": "<block_height>",
  "model": "claude-opus-4-7",
  "billed": "$0.0066",
  "confirmations": 12
}

Try it. The first call lands on chain in 60 seconds.

Read the 60-second quickstart →

Three ways your agentbuys inference today.Here's how they stack.

Call providers directly.

Use a multi-model gateway.

Route per task, on the record.

What each one does — and doesn't.

One real research call, three bills.

You pin claude-opus-4-7

You pin claude-opus-4-7 via gateway

Pass model: "auto"

Pick by what you need next quarter.

You're prototyping or internal.

You want one key, many models.

Your agent spends real money.

It's a one-line change either way.

Already on the OpenAI SDK?Two lines.

Already on a gateway?Even simpler.

Try it. The first call lands on chain in 60 seconds.

Three ways your agent
buys inference today.
Here's how they stack.

Already on the OpenAI SDK?
Two lines.

Already on a gateway?
Even simpler.