If we don’t test all the code, entire global systems can go down—and it’s not as simple as it seems.
In this episode I speak with Colette Alexander, Director of Site Reliability Engineering at HashiCorp, about software failures, risk in software testing, and resilience in engineering. They dive deep into the CrowdStrike outage, exploring why skipping tests in software releases can have catastrophic effects. But it’s not just about a software company making a bad call—it's about the trade-off between speed and safety, how software engineers balance risk, and why testing everything isn’t always an option.
Colette shares insights from the world of site reliability engineering (SRE), drawing parallels with aviation, space disasters like Challenger, and even the psychology of teamwork in rock bands. This episode is a must-listen for anyone in software development, DevOps, cybersecurity, or engineering leadership who wants to understand the real-world impact of software testing decisions.
Topics Covered:
✔ The CrowdStrike outage: What went wrong?
✔ The hidden risks of incomplete software testing
✔ How software engineers balance speed, security, and resilience
✔ The psychological and organizational pressures behind failures
✔ Lessons from music, space disasters, and high-risk industries
How Complex Systems Fail – Richard CookMission Improbable: Using Fantasy Documents to Tame Disaster – Lee ClarkeFantasy Documents and Disaster: A Case Study of the Long Island Nuclear Plant – Lee Clarke & Charles PerrowBeyond Blame: Learning From Failure and Success – Dave ZwiebackPodcast: This Is FinePodcast Website: thisisfinepod.comTransforming teams. Unlocking human potential.
Using principles from Human Factors (HF), High-Reliability Organisations (HRO), and Human and Organisational Performance (HOP), we develop and deliver highly immersive and impactful programmes using the High-Velocity Learning LAB (HVLL) concept. We give you the know-how, the tools and the support to make results stick and empower your people to achieve the extraordinary. We help you answer the question "How do we uncover those hidden stories in our organisation?"