How does usage billing work?

Usage charges (beyond your plan's included limits) are the sum of enabled components: Usage = base routing + (optional) semantic cache + (optional) tracing/logs + (optional) agent runtime + (optional) evaluations. Base routing (tiered, per million requests): 0-1M $1.50; 1-10M $1.20; 10M+ $1.00. Semantic cache: write $0.10/M tokens; read $0.01/M tokens; storage $0.15/GB/mo. Tracing: $0.00015/trace; log storage $0.08/GB/mo. Agent runtime: $0.0012/sec compute; $0.08/GB-hour memory. Evaluations: $0.002/eval (<1k); $0.0015/eval (batch).

What savings can I expect?

Most teams see ~18–35% reduction in token spend from semantic caching, cheaper model routing, and automatic fallbacks.

Do you offer private deployment?

Yes. Enterprise plans can be deployed on dedicated Vercel projects or private clouds with data residency and custom SLAs.

Honest usage-based pricing · effective from day one

Simple pricing that scales with usage

Q: What savings can I expect?

Most teams see ~18–35% reduction in token spend from semantic caching, cheaper model routing, and automatic fallbacks.

Q: Do you offer private deployment?

Yes. Enterprise plans can be deployed on dedicated Vercel projects or private clouds with data residency and custom SLAs.

Q: Can we validate savings on our own traffic before buying?

Yes. A typical POC mirrors 1-2 weeks of traffic, builds a single-model baseline, then replays candidate policies without changing end-user behavior.

Q: How do you separate public claims from private diligence material?

Public pages only show product architecture, methodology, and modeled examples. Customer-specific benchmarks, contracts, and compliance evidence are shared by request or under NDA.

Q: What data is retained in traces?

By default traces keep request metadata, routing decisions, token counts, model choice, errors, and timing. Prompt and output content retention can be shortened or disabled per policy.

Q: Do you support regulated workloads?

The platform is designed for regulated review with SSO, RBAC, audit export, PII controls, residency options, DPA workflows, and BAA templates for eligible enterprise customers.

Q: Which teams need to be involved in evaluation?

The strongest evaluations include the product owner, platform/FinOps, security, legal/procurement, and one engineering owner who can compare traces against the current stack.

Start free, upgrade when usage grows. Transparent plans for builders, teams, and enterprise.

Monthly stays flexible. Switch to annual to save 20%.

Free

For individuals and evaluation

1 project, 1 environment
Basic routing + budgets
Core metrics (tokens, latency, errors)
Monthly usage cap (~$20 equiv.)
Community support

Start free

Pro

Team

$199/mo

For growing product teams

Up to 20 seats included
Team RBAC + audit v1
Lower usage rates
Shared policy/prompt library
99.9% SLA

Upgrade to Team

Enterprise

From $20k/yr

For regulated production workloads

SSO/SAML + fine-grained RBAC
Full audit logs and retention
Tenant isolation + residency options
PII redaction and safety policies
99.95%+ SLA and dedicated support

Book enterprise POC

Modeled across 180+ composite workload profiles

Aurelis Bank

Maple Health

CodeForge Labs

Northstar Commerce

Helios Insurance

Vector Mobility

Sample bills

Three realistic scenarios, modeled bills

Numbers below come from the published price list, typical usage, and composite workload assumptions. Real bills vary with cache hit rate, token distribution, and model-pool strategy.

Indie hacker

Free to Pro

Side-project chatbot · 50K req/mo · 800 avg tokens · cost goal · cache on

about $29/ mo

Pro plan base$29
50K req routing (in plan)$0
Token spend (cache 32%)$5–8
Trace retention 7d$0

Growth-stage SaaS

Team

B2B copilot · 1.2M req/mo · 1.5K avg tokens · balanced strategy · agent runs

about $850-1,100/ mo

Team plan base$199
0–1M req @ $1.50/M$1.50
1–1.2M req @ $1.20/M$0.24
Token + cache + trace$420–620
Agent runtime (15K runs)$190–270

Regulated enterprise

Enterprise

Bank support copilot · 25M req/mo · SSO + BAA + EU residency · 99.95% SLA

From $20k/ yr

Annual Enterprise license$20k+
Volume usage (committed)Per contract
Dedicated SE + SlackIncluded
BAA + DPA + MSAIncluded

Modeled benchmark suite

Put pricing inside real workload shapes

The numbers below are realistic planning benchmarks built from composite workload profiles, deterministic replay assumptions, and the public pricing model. They are not audited customer production claims.

Workload	Baseline cost	Routed cost	Savings	Policy
Support copilot 2.8M requests/mo · 980 avg tokens · 38% repeatable intents	$7,840	$5,360	31.6%	intent router + semantic cache + two-step fallback
Knowledge QA / RAG 640K queries/mo · 2.7K avg tokens · 21% low-confidence retrieval	$11,420	$8,030	29.7%	retrieval-confidence routing + eval-backed fallback
Sales / CRM agent 1.1M generations/mo · 620 avg tokens · 6 quality gates	$3,880	$2,960	23.7%	lead-tier routing + quality gates + CRM sync
Developer assistant 360K agent runs/mo · 11 tool calls/run · 42% sandboxed writes	$18,200	$13,940	23.4%	agent runtime + scoped MCP tools + trace replay

Full plan comparison

What's in each tier

Feature	Free	Pro	Team	Enterprise
Routing
Multi-model routing	Included	Included	Included	Included
Auto fallback chains	Included	Included	Included	Included
Per-request budget caps	1 policy	Unlimited	Unlimited	Unlimited
Semantic cache	Not included	Included	Included	Included
Custom routing policies	Not included	Included	Included	Included
Agents
Agent runtime (sandboxed)	Trial	Included	Included	Included
Tool / MCP registry	Not included	Included	Included	Included
Streaming + advisor mode	Not included	Included	Included	Included
Observability
Trace retention	24h	7d	30d	90d
OpenTelemetry export	Not included	Included	Included	Included
Webhooks	Not included	Included	Included	Included
Team & access
Seats	1	5 included	20 included	Unlimited
Team RBAC + audit	Not included	Not included	Included	Included
SSO / SAML	Not included	Not included	Not included	Included
SCIM provisioning	Not included	Not included	Not included	Included
Compliance
PII detection / redaction	Detect-only	Included	Included	Included
Tenant isolation	Not included	Not included	Included	Included
Data residency (US / EU / APAC)	Not included	Not included	Not included	Included
BAA (HIPAA)	Not included	Not included	Not included	Included
DPA + custom MSA	Not included	Not included	Included	Included
Support & SLA
Support channel	Community	Email	Priority email	Dedicated SE + Slack
Response SLA	—	1 business day	4 hours	1 hour (P0)
Uptime SLA	—	—	99.9%	99.95%+

Usage line items (a la carte)

What you pay beyond included plan limits

Each component is priced separately; you're only charged for what you actually use. Free has a monthly cap; Pro / Team overages add to the bill below; Enterprise runs on committed-use contracts.

Component	Unit	Price
Base routing — tier 1	0–1M req / mo	$1.50 / 1M
Base routing — tier 2	1–10M req / mo	$1.20 / 1M
Base routing — tier 3	10M+ req / mo	$1.00 / 1M
Semantic cache — write	per 1M tokens	$0.10
Semantic cache — read	per 1M tokens	$0.01
Cache storage	per GB / mo	$0.15
Tracing	per trace	$0.00015
Trace storage	per GB / mo	$0.08
Agent runtime — compute	per second	$0.0012
Agent runtime — memory	per GB-hour	$0.08
Evaluations — interactive	< 1k / mo	$0.002
Evaluations — batch	≥ 1k / mo	$0.0015

Prices in USD. For volume / committed-use discounts see the Enterprise plan.

Estimate your ROI

Slide to see your expected bill + savings

Computed from published model prices + typical cache hit rate. Actual savings depend on your prompt distribution.

Pricing ROI Calculator

Monthly requests

200,000 req

10k2M

Avg tokens / request

850 tokens

Optimization strategy

Baseline cost

$2040.00

Direct single-model calls

Router cost

$40.00

Includes routing + cache

Net savings

$327.20

16.0% vs baseline

How we think about pricing

Four commitments

No surprise bills

Hard budget caps per request, per project, and per account. Hit 80% → email; hit 100% → router refuses to spend more.

Pay for value, not seats

Free seats up to plan limit; only usage scales. Adding a teammate doesn't auto-bill you.

Downgrade any time

Annual customers get pro-rated refunds. No multi-year lock-ins on Pro / Team plans.

Compliance is included

PII detection, audit logs, and DPA are part of every paid plan — not an upsell.

Enterprise

Regulated workloads, custom SLA, residency on demand

From $20k/yr, includes SSO/SAML, SCIM, BAA, custom MSA, dedicated SE, direct Slack channel. Committed-use discounts and multi-year contracts available.

✓ Data residency: US / EU / APAC (US + EU at public beta)
✓ 99.95%+ uptime SLA with credits
✓ 1-hour P0 response, dedicated Slack channel
✓ HIPAA BAA, custom DPA, signed MSA
✓ Committed-use discount tiers per contract
✓ Private deployment / VPC peering options

Talk to founders View Trust Center

First reply within 1 business day.

Composite buyer language about pricing

In the composite benchmark, migrating 14 AI workloads behind one routing layer cut token spend by 31% and reduced P95 latency by 22% while keeping every decision traceable for audit review.

Priya Ramanathan

VP, AI Platform · Aurelis Bank

The router, guardrails, and agent runtime model replaced four homegrown systems in the rollout plan, taking a HIPAA-bound triage copilot from mirror traffic to governed launch in 47 days.

Daniel Yoon

Chief Technology Officer · Maple Health

The marketplace pricing model shows how dimensional usage, model quality, and seat controls can become one invoice instead of three disconnected billing systems.

Alessia Marchetti

Founder & CEO · CodeForge Labs

Procurement and POC FAQ

Can we validate savings on our own traffic before buying?

Yes. A typical POC mirrors 1-2 weeks of traffic, builds a single-model baseline, then replays candidate policies without changing end-user behavior.

How do you separate public claims from private diligence material?

Public pages only show product architecture, methodology, and modeled examples. Customer-specific benchmarks, contracts, and compliance evidence are shared by request or under NDA.

What data is retained in traces?

By default traces keep request metadata, routing decisions, token counts, model choice, errors, and timing. Prompt and output content retention can be shortened or disabled per policy.

Do you support regulated workloads?

The platform is designed for regulated review with SSO, RBAC, audit export, PII controls, residency options, DPA workflows, and BAA templates for eligible enterprise customers.

Which teams need to be involved in evaluation?

The strongest evaluations include the product owner, platform/FinOps, security, legal/procurement, and one engineering owner who can compare traces against the current stack.

FAQ

Ready to start?

Free plan is instant · Pro / Team have 14-day full refund · Enterprise goes straight to the founding team.

Start free Contact sales