Smooth Scaling: System Design for High Traffic

Database Scaling at Intercom: Aurora, PlanetScale & Incident Response with Engineering Director Ryan Sherlock


Listen Later

In this episode of Smooth Scaling, José Quaresma talks with Ryan Sherlock, Director of Engineering at Intercom, about the realities of scaling databases in a fast-growing SaaS product. Ryan shares Intercom’s journey from a single MySQL database through Aurora, proxies, and per-customer scaling patterns—and what eventually pushed the team toward PlanetScale. The conversation also explores Intercom’s heartbeat-based approach to incident detection and response, focusing on customer impact rather than infrastructure metrics.

Episode page

---

  • (00:00) - Intro and episode overview
  • (01:14) - Early scaling pains: systems going down every day
  • (02:56) - Database evolution: MySQL, caching, Aurora, and ProxySQL
  • (07:36) - Tens of billions of rows and the table Intercom couldn’t migrate
  • (09:07) - Intercom’s multi-region architecture and the EU region
  • (10:59) - Why Intercom moved from Aurora to PlanetScale (Vitess)
  • (15:12) - PlanetScale in practice: shards, VTGate, and zero-downtime upgrades
  • (22:39) - Heartbeat metrics and automated incident response
  • (30:03) - AWS outage case study: DynamoDB failure and real-time recovery
  • (34:17) - Incident mitigation lessons: “I’m now a web box” and VTGate limits
  • (41:40) - Rapid fire questions: books, career advice, and scalability mindset

  • Ryan Sherlock is Senior Director of Engineering at Intercom in Dublin, where he leads the core technologies and infrastructure groups that power Intercom’s AI first customer service platform. Through talks and writing on the Intercom engineering blog, he shares practical playbooks on scaling infrastructure and engineering enablement, running high leverage incident response, and using heartbeat metrics to tie reliability directly to real customer outcomes rather than just server graphs. Outside Intercom, he serves on the board of the Rails Foundation, helping steward the future of the Ruby on Rails ecosystem. Before moving into tech leadership, Ryan spent several years as a professional cyclist, an experience he wrote about in “Why you should have skin in the engineering game”, and that still shapes how he thinks about risk, ownership, and reliability in software.

    This podcast is hosted by José Quaresma, researched by Joseph Thwaites and produced by Perseu Mandillo. 

    © Queue-it, 2026

    ...more
    View all episodesView all episodes
    Download on the App Store

    Smooth Scaling: System Design for High TrafficBy Queue-it

    • 5
    • 5
    • 5
    • 5
    • 5

    5

    2 ratings