
Sign up to save your podcasts
Or
In this exciting episode of Cloud Dialogues, we are joined by Liz Fong-Jones, Field CTO at Honeycomb and former Google SRE, to explore the fascinating world of Site Reliability Engineering (SRE)—a game-changer for scaling and automating large systems.
What We Covered:
1. Meet Liz Fong-Jones: Liz brings over a decade of SRE experience from her time at Google and Honeycomb, helping companies revolutionize how they manage reliability and automation.
2. The Origin Story: SRE actually predates the cloud! Born at Google in the early 2000s, SRE started as a way to automate manual system administration tasks and has since evolved into its own discipline, running parallel to DevOps.
3. SRE at Its Core:
4. Different SRE Models: There are different ways to implement SRE:
5. The SRE Mindset: Curiosity and empathy are essential for SREs. Teams need a culture of psychological safety where concerns can be raised without fear.
6. The Magic of SLOs and SLIs: SLOs set reliability targets (like aiming for 99.5% uptime), while SLIs measure performance against those targets. Together, they ensure your systems are running smoothly.
7. FinOps Meets SRE: Liz explains how SREs can help balance reliability, performance, and costs using SLOs to allocate resources more efficiently.
8. Disaster Testing: Want proof SREs are ready for anything? Honeycomb regularly tests its disaster recovery by taking down an entire availability zone—on purpose!
9. Pro Tips for Executives: Thinking about implementing SRE at your company? Liz suggests starting with your biggest challenges, offering executive support, and setting clear, achievable SLOs.
10. Why Observability Matters: Observability is the backbone of SRE. Having real-time, actionable data is key for setting and managing effective SLOs.
Plus, Liz gives covers off on her favorite ARM processors (for cost and environmental savings) and shares insights from her book Observability Engineering.
This episode is a deep dive into SRE, filled with actionable insights and strategies for leaders looking to supercharge their reliability game. You won’t want to miss it!
In this exciting episode of Cloud Dialogues, we are joined by Liz Fong-Jones, Field CTO at Honeycomb and former Google SRE, to explore the fascinating world of Site Reliability Engineering (SRE)—a game-changer for scaling and automating large systems.
What We Covered:
1. Meet Liz Fong-Jones: Liz brings over a decade of SRE experience from her time at Google and Honeycomb, helping companies revolutionize how they manage reliability and automation.
2. The Origin Story: SRE actually predates the cloud! Born at Google in the early 2000s, SRE started as a way to automate manual system administration tasks and has since evolved into its own discipline, running parallel to DevOps.
3. SRE at Its Core:
4. Different SRE Models: There are different ways to implement SRE:
5. The SRE Mindset: Curiosity and empathy are essential for SREs. Teams need a culture of psychological safety where concerns can be raised without fear.
6. The Magic of SLOs and SLIs: SLOs set reliability targets (like aiming for 99.5% uptime), while SLIs measure performance against those targets. Together, they ensure your systems are running smoothly.
7. FinOps Meets SRE: Liz explains how SREs can help balance reliability, performance, and costs using SLOs to allocate resources more efficiently.
8. Disaster Testing: Want proof SREs are ready for anything? Honeycomb regularly tests its disaster recovery by taking down an entire availability zone—on purpose!
9. Pro Tips for Executives: Thinking about implementing SRE at your company? Liz suggests starting with your biggest challenges, offering executive support, and setting clear, achievable SLOs.
10. Why Observability Matters: Observability is the backbone of SRE. Having real-time, actionable data is key for setting and managing effective SLOs.
Plus, Liz gives covers off on her favorite ARM processors (for cost and environmental savings) and shares insights from her book Observability Engineering.
This episode is a deep dive into SRE, filled with actionable insights and strategies for leaders looking to supercharge their reliability game. You won’t want to miss it!