Stop Selling Model Names. Sell Uptime: Multi-Provider Routing with Client-Facing SLOs
The Problem Nobody Talks About
Every AI provider goes down. Not maybe. Not occasionally. Regularly.
November 25, 2024: OpenAI suffered widespread timeouts and 503 errors for hours
September 2025: Anthropic published postmortems for three separate Claude API incidents
2024-2025: Cloudflare global incidents cascaded into half the AI services on the internet
If your revenue depends on AI output, a single-provider architecture is a single point of failure with your name on it.
The Solution: Reliability as a Feature
Stop leading with "We use GPT-4" or "We're on Claude." Start leading with numbers:
99.5% of requests succeed
P95 latency under 2.5 seconds
Average cost per request under $0.015
That's a promise a client can hold you to—and it makes you worth more than the person who just says "we use the best model."
The Technical Stack
1. Two-Provider Router with LiteLLM
Not five providers. Not a fancy model cascade. Two.
Primary gets weight of 9, secondary gets weight of 1
LiteLLM retries in-group once, then fails over automatically
Your app hits one endpoint—routing happens behind the proxy
Keep a bypass switch: BYPASS_ROUTER=true for 30-second rollback
Set explicit routing order (primary first, secondary only on failure)
2-second stream timeout for time-to-first-token
Pin providers for latency-critical paths
2. Budget Guardrails and Cost Control
The problem: Secondary providers can be 3x more expensive per token
The solution: Budget guardrails in LiteLLM
Maximum cost per request
Maximum tokens in/out
Graceful degradation (truncate context, switch to cheaper model, return cached response)
Tag every request: tenant ID, feature, provider, tokens, cost
Pipe into Langfuse or Helicone (both have free tiers)
P95 latency over target for 15 minutes → page
Success rate below target for 5 minutes → page
Average cost per request over budget for 15 minutes → page
3. Travel-Mode Cache
The reality: Airport throttling, café wifi drops, connectivity chaos
The solution: Write-through cache + service workers
Every router response written to local cache (Redis, SQLite)
Keyed on normalized prompt version
Service worker intercepts fetch requests, falls back to cache on network failure
Bonus: 60%+ cache hit rates on repetitive prompts = major cost savings
Provider-side optimization:
Anthropic prompt caching for stable blocks (system instructions, tool definitions)
Default 5-minute TTL, optional 1-hour cache
Reduces both latency and input token cost
Client-Facing SLOs
The Language That Wins Deals
Most AI agency proposals: "We use state-of-the-art AI models"
> "99.5% success rate, p95 latency under 2.5 seconds, average cost per request under $0.015, measured over a rolling 30-day window"
CTO understands your architecture
VP of Operations understands "99.5% uptime"
Different audiences, different languages
SLO vs SLA Distinction
SLI = The measurement (p95 latency)
SLO = The target ("95% of requests complete in under 2.5 seconds")
SLA = The contract (legal commitment with penalties)
Publish SLOs, not SLAs. SLO = transparency commitment. SLA = legal obligation with penalties.
Error Budget Framework
If your target is 99.5% success rate over 30 days:
You're allowed to fail on 0.5% of requests
On 10,000 requests/month = 50 allowed failures
Spend budget on deploys, experiments, provider hiccups
When it's gone, freeze changes and stabilize
The 30-Minute Friday Drill
Why Manual Drills Matter
Don't automate the drill. The point isn't to test the system—it's to test you.
AWS calls these "chaos game days." Google calls them "Wheel of Misfortune exercises."
Drill Structure (30 minutes)
Three roles (even if you're playing all three):
Drill lead runs the clock
Operator flips the switch
Scribe captures what happened
Revoke your primary provider's API key
Watch the router fail over
Confirm p95 stays within target
Restore the key and verify everything's green
Tie results to error budget: If failover took longer than expected or success rate dipped below SLO, that's a finding. Log it, fix it, run again next quarter.
When It's Boring, It Works
The goal: Make reliability boring.
If your infrastructure is exciting, something's wrong. Ship the boring infrastructure. Sell the boring promise. Win the clients who care about reliability more than hype.
Action Items
Stand up the router with two providers
Set the three alerts
Run the drill Friday
Layer in the cache
Add SLO language to proposals
Implement full observability
Resources
Download the complete Reliability SLO Kit:
SLO one-pager template
Budget guardrail sheet with alert thresholds
Router config
Cache recipe
30-minute drill SOP with rollback steps
Client-safe proposal language
Available on the Resources page
Legal disclaimer: The SLO/SOW language provided is template language, not legal advice. Have your counsel review before shipping to clients.