The Critical Channel

Episode 9: The Bug Team (Incidents Part One)


Listen Later

The Problem: You may only know a single tcpdump command, but you're sure as hell going to use it.

We're trying something new this week - it's a two-parter! We decided to live up to our name and talk about critical incident response procedures.

In this first half, we talk about how to craft a sustainable on-call rotation, how to compensate your engineers for living with the stress of on-call, and how to convince management that you need an on-call rotation.

Plus, Warnar definitely does not advocate for drink-driving. Don't do that.

Links:

  • MTBF, MTTR, MTTA, and MTTF
  • Crafting Sustainable On-Call Rotations
  • How Monzo do on-call
  • How Monzo's on-call system evolved
  • Google SRE: Error Budgets and Maintenance Windows
  • Fifth Gear: What's Worse, Drink Driving or Driving Tired?
...more
View all episodesView all episodes
Download on the App Store

The Critical ChannelBy criticalchannel.io