Tern Stories

"We Should Be Able to Drain an AZ" | Ep. 10


Listen Later

In this episode, Cooper Bethea, a senior staff engineer at Slack, shares his journey of transforming Slack's architecture to a cellular model, allowing for more resilient operations.

Cooper recounts the frustrations of dealing with outages and the decision-making process that led to the migration to a cellular architecture. He explains how the initial struggles with service discovery and load balancing prompted a reevaluation of their infrastructure.

The key insight was that draining an AZ should not be a rare, high-stakes event but rather a routine operation that could be executed with confidence. Cooper discusses the importance of incremental changes and how they were able to practice draining traffic from AZs during peaceful times, ultimately leading to a more robust system.

This episode is a must-listen for anyone interested in infrastructure, reliability, and the challenges of scaling systems in a cloud environment.

-----

Get Tern Stories in your inbox: https://tern.sh/youtube

Connect with Cooper ➡️ https://www.linkedin.com/in/cooper-bethea-521936201/

...more
View all episodesView all episodes
Download on the App Store

Tern StoriesBy Tern - AI Code Migrations