Build a Three-Tier AI Failover That Survives Provider Outages
The Recent Incidents That Changed Everything
April 6-7, 2026: Anthropic's Claude experienced back-to-back days of elevated errors
April 6: 15:00-16:30 UTC, login errors affecting Claude.ai and Claude CodeApril 7: 14:32-15:12 UTC, same symptoms across login, chats, and voiceMarch 4, 2026: OpenAI logged elevated API error rates for 30 minutes across multiple models due to simultaneous infrastructure actions
March 16, 2026: Google announced Project Spend Caps for Gemini API with ~10-minute enforcement delays
The Three-Tier Failover Architecture
Hot Tier
Same provider, different model or endpointExample: Claude Sonnet fails → switch to Claude HaikuHandles partial outages where some models still respondUses circuit breakers to detect consecutive failuresWarm Tier
Completely different providerExample: Anthropic down → route to OpenAI or GeminiRequires OpenAI-compatible gateway layer for request normalizationTest tool calling and JSON mode differences beforehandCold Tier
Graceful degradation + human-in-the-loopRequests go into Redis queue with BullMQReturns 202 (accepted, processing) to clientTriggers notifications to ops teamPre-written message templates for client communicationKey Technical Patterns
Circuit Breakers (Martin Fowler pattern)
Track consecutive failures per routeOpen after 5 failures, enter half-open stateProbe every 30 seconds, close after 2 successesExponential Backoff with Jitter (AWS guidance)
Prevents thundering herd during outagesFirst retry: 200ms + random offsetEach retry waits longer with randomizationIdempotency Keys (Stripe pattern)
Hash user ID + job ID + input for unique keyPrevents duplicate processing on retriesEssential for safe retry logicBudget Guardrails
Google's Project Spend Caps
Set in AI Studio under Spend tabMonthly dollar limits per project~10-minute enforcement delayBilling account $0 balance stops ALL linked projectsWebhook endpoint receives spend percentageFlips Redis flag at 80% of monthly budgetQueue workers check flag before processingManual resume endpoint when spend dropsImplementation Overview
JSON routing config (environment-based URLs)40-line router function in Node/PythonRedis instance for queuing ($5-15/month)Circuit breaker libraries (Cockatiel, PyBreaker)Retry libraries (Tenacity for Python)Infrastructure: $20-80/month totalMost months closer to low endCompare to cost of missed deliverable ($8K+ for Santi)Even $500/month clients will churn on missed deadlinesThe Counterargument: Is This Overengineering?
Adds complexity that creates new failure modesGateway layers can introduce latency/quirksCircuit breaker thresholds need calibrationMost LLM APIs use global endpoints (not regional)Start with circuit breaker + one warm provider (20 lines of code)Add cold queue when ready (Redis + notifications) Budget guardrails only if spending enough to matterFor non-technical users: Make/n8n error paths + Google SheetsThe Lisbon Test
Deploy from a café with sketchy wifi? ✓Let async team operate without you online? ✓ Survive bad connectivity? ✓Block hot provider domain locally → confirm warm takeoverForce 500 errors from gateway → confirm circuit opensPost fake budget alert → confirm pause flag setsResources
Download: Nomad-Proof Model Failover SOP
JSON routing config templatesNode (Cockatiel + BullMQ) wrapper codePython (Tenacity + PyBreaker) implementation Redis queue setup with pause flagsBudget webhook specificationsCost comparison spreadsheetLisbon Test validation checklistAction Items
This Week: Pick primary provider + one warm alternative. Write 20 lines of failover code OR build one error path in Make/n8n. Test it.
This Weekend: Implement the full three-tier system if you're running client-facing AI operations.
The next outage window is coming - we just don't know when.