December 11, 2025

Cloud Outages: Lessons in Resilience and Error Budgets

45 minutes

In this episode of 10,000 Feet, Nate Sherman and Matt Glenn dive into the recent wave of major cloud outages impacting AWS, Azure, and Cloudflare, exploring what went wrong and why these failures are so disruptive. They discuss the growing risks of globally applied changes, the importance of error budgets, and strategies for building resilience in modern infrastructure. The conversation also covers best practices in site reliability engineering, monitoring, and alerting, as well as the role of AI and automation in change management. Packed with insights for architects, SREs, and IT leaders, this episode offers practical guidance on balancing speed, reliability, and risk in today’s cloud-driven world.

...more

View all episodes

By Vervint

1515 ratings

December 11, 2025

Cloud Outages: Lessons in Resilience and Error Budgets

45 minutes

...more

Share Cloud Outages: Lessons in Resilience and Error Budgets

Sign up to save your podcasts

Cloud Outages: Lessons in Resilience and Error Budgets

Cloud Outages: Lessons in Resilience and Error Budgets