April 22, 2026

Build a Three-Tier AI Failover That Survives Provider Outages

21 minutes

The Recent Incidents That Changed Everything

April 6-7, 2026: Anthropic's Claude experienced back-to-back days of elevated errors

April 6: 15:00-16:30 UTC, login errors affecting Claude.ai and Claude Code

April 7: 14:32-15:12 UTC, same symptoms across login, chats, and voice

March 4, 2026: OpenAI logged elevated API error rates for 30 minutes across multiple models due to simultaneous infrastructure actions

March 16, 2026: Google announced Project Spend Caps for Gemini API with ~10-minute enforcement delays

The Three-Tier Failover Architecture

Hot Tier

Same provider, different model or endpoint

Example: Claude Sonnet fails → switch to Claude Haiku

Handles partial outages where some models still respond

Uses circuit breakers to detect consecutive failures

Warm Tier

Completely different provider

Example: Anthropic down → route to OpenAI or Gemini

Requires OpenAI-compatible gateway layer for request normalization

Test tool calling and JSON mode differences beforehand

Cold Tier

Graceful degradation + human-in-the-loop

Requests go into Redis queue with BullMQ

Returns 202 (accepted, processing) to client

Triggers notifications to ops team

Pre-written message templates for client communication

Key Technical Patterns

Circuit Breakers (Martin Fowler pattern)

Track consecutive failures per route

Open after 5 failures, enter half-open state

Probe every 30 seconds, close after 2 successes

Exponential Backoff with Jitter (AWS guidance)

Prevents thundering herd during outages

First retry: 200ms + random offset

Each retry waits longer with randomization

Idempotency Keys (Stripe pattern)

Hash user ID + job ID + input for unique key

Prevents duplicate processing on retries

Essential for safe retry logic

Budget Guardrails

Google's Project Spend Caps

Set in AI Studio under Spend tab

Monthly dollar limits per project

~10-minute enforcement delay

Billing account $0 balance stops ALL linked projects

App-Level Protection

Webhook endpoint receives spend percentage

Flips Redis flag at 80% of monthly budget

Queue workers check flag before processing

Manual resume endpoint when spend drops

Implementation Overview

Core Components

JSON routing config (environment-based URLs)

40-line router function in Node/Python

Redis instance for queuing ($5-15/month)

Circuit breaker libraries (Cockatiel, PyBreaker)

Retry libraries (Tenacity for Python)

Cost Analysis

Infrastructure: $20-80/month total

Most months closer to low end

Compare to cost of missed deliverable ($8K+ for Santi)

Even $500/month clients will churn on missed deadlines

The Counterargument: Is This Overengineering?

Valid Concerns

Adds complexity that creates new failure modes

Gateway layers can introduce latency/quirks

Circuit breaker thresholds need calibration

Most LLM APIs use global endpoints (not regional)

Proportional Response

Start with circuit breaker + one warm provider (20 lines of code)

Add cold queue when ready (Redis + notifications)

Budget guardrails only if spending enough to matter

For non-technical users: Make/n8n error paths + Google Sheets

The Lisbon Test

Can you:

Deploy from a café with sketchy wifi? ✓

Let async team operate without you online? ✓

Survive bad connectivity? ✓

15-Minute Validation

Block hot provider domain locally → confirm warm takeover

Force 500 errors from gateway → confirm circuit opens

Post fake budget alert → confirm pause flag sets

Resources

Download: Nomad-Proof Model Failover SOP

JSON routing config templates

Node (Cockatiel + BullMQ) wrapper code

Python (Tenacity + PyBreaker) implementation

Redis queue setup with pause flags

Budget webhook specifications

Cost comparison spreadsheet

Lisbon Test validation checklist

Action Items

This Week: Pick primary provider + one warm alternative. Write 20 lines of failover code OR build one error path in Make/n8n. Test it.

This Weekend: Implement the full three-tier system if you're running client-facing AI operations.

The next outage window is coming - we just don't know when.

...more

View all episodes

By Santi, Kira

April 22, 2026

Build a Three-Tier AI Failover That Survives Provider Outages

21 minutes

Build a Three-Tier AI Failover That Survives Provider Outages

The Recent Incidents That Changed Everything

April 6-7, 2026: Anthropic's Claude experienced back-to-back days of elevated errors

April 6: 15:00-16:30 UTC, login errors affecting Claude.ai and Claude Code

April 7: 14:32-15:12 UTC, same symptoms across login, chats, and voice

March 4, 2026: OpenAI logged elevated API error rates for 30 minutes across multiple models due to simultaneous infrastructure actions

March 16, 2026: Google announced Project Spend Caps for Gemini API with ~10-minute enforcement delays

The Three-Tier Failover Architecture

Hot Tier

Same provider, different model or endpoint

Example: Claude Sonnet fails → switch to Claude Haiku

Handles partial outages where some models still respond

Uses circuit breakers to detect consecutive failures

Warm Tier

Completely different provider

Example: Anthropic down → route to OpenAI or Gemini

Requires OpenAI-compatible gateway layer for request normalization

Test tool calling and JSON mode differences beforehand

Cold Tier

Graceful degradation + human-in-the-loop

Requests go into Redis queue with BullMQ

Returns 202 (accepted, processing) to client

Triggers notifications to ops team

Pre-written message templates for client communication

Key Technical Patterns

Circuit Breakers (Martin Fowler pattern)

Track consecutive failures per route

Open after 5 failures, enter half-open state

Probe every 30 seconds, close after 2 successes

Exponential Backoff with Jitter (AWS guidance)

Prevents thundering herd during outages

First retry: 200ms + random offset

Each retry waits longer with randomization

Idempotency Keys (Stripe pattern)

Hash user ID + job ID + input for unique key

Prevents duplicate processing on retries

Essential for safe retry logic

Budget Guardrails

Google's Project Spend Caps

Set in AI Studio under Spend tab

Monthly dollar limits per project

~10-minute enforcement delay

Billing account $0 balance stops ALL linked projects

App-Level Protection

Webhook endpoint receives spend percentage

Flips Redis flag at 80% of monthly budget

Queue workers check flag before processing

Manual resume endpoint when spend drops

Implementation Overview

Core Components

JSON routing config (environment-based URLs)

40-line router function in Node/Python

Redis instance for queuing ($5-15/month)

Circuit breaker libraries (Cockatiel, PyBreaker)

Retry libraries (Tenacity for Python)

Cost Analysis

Infrastructure: $20-80/month total

Most months closer to low end

Compare to cost of missed deliverable ($8K+ for Santi)

Even $500/month clients will churn on missed deadlines

The Counterargument: Is This Overengineering?

Valid Concerns

Adds complexity that creates new failure modes

Gateway layers can introduce latency/quirks

Circuit breaker thresholds need calibration

Most LLM APIs use global endpoints (not regional)

Proportional Response

Start with circuit breaker + one warm provider (20 lines of code)

Add cold queue when ready (Redis + notifications)

Budget guardrails only if spending enough to matter

For non-technical users: Make/n8n error paths + Google Sheets

The Lisbon Test

Can you:

Deploy from a café with sketchy wifi? ✓

Let async team operate without you online? ✓

Survive bad connectivity? ✓

15-Minute Validation

Block hot provider domain locally → confirm warm takeover

Force 500 errors from gateway → confirm circuit opens

Post fake budget alert → confirm pause flag sets

Resources

Download: Nomad-Proof Model Failover SOP

JSON routing config templates

Node (Cockatiel + BullMQ) wrapper code

Python (Tenacity + PyBreaker) implementation

Redis queue setup with pause flags

Budget webhook specifications

Cost comparison spreadsheet

Lisbon Test validation checklist

Action Items

This Week: Pick primary provider + one warm alternative. Write 20 lines of failover code OR build one error path in Make/n8n. Test it.

This Weekend: Implement the full three-tier system if you're running client-facing AI operations.

The next outage window is coming - we just don't know when.

...more

Share Build a Three-Tier AI Failover That Survives Provider Outages

Sign up to save your podcasts

Build a Three-Tier AI Failover That Survives Provider Outages

Build a Three-Tier AI Failover That Survives Provider Outages