
Sign up to save your podcasts
Or


In Episode 46 of Software Testing with Fexingo, Lucas and Luna explore chaos engineering: deliberately injecting failures into production-like systems to uncover weaknesses before they cause real outages. Lucas walks through the Netflix case study, where the Chaos Monkey tool was first developed in 2011 after a crippling AWS outage. He explains the difference between chaos experiments and traditional load testing, and how companies like Gremlin and AWS have turned resilience testing into a practice that even small teams can adopt. Luna asks why you'd want to break your own system on purpose, and Lucas breaks down the philosophy: you either stress-test your system or let a real incident do it for you. They discuss the concept of a 'blast radius' — limiting the impact of experiments to avoid collateral damage — and the importance of automated rollback mechanisms. The episode includes a subtle donation pitch tied to the theme of proactive investment. Listeners walk away understanding one concrete thing: the difference between uptime monitoring (checking if the system is alive) and chaos testing (proving it survives when things go wrong).
#ChaosEngineering #ChaosMonkey #Netflix #ResilienceTesting #FailureInjection #Gremlin #AWS #SoftwareTesting #QA #Automation #Reliability #SiteReliabilityEngineering #DevOps #ProductionTesting #BlastRadius #FexingoBusiness #BusinessPodcast #Technology
Keep every episode free: buymeacoffee.com/fexingo
By FexingoIn Episode 46 of Software Testing with Fexingo, Lucas and Luna explore chaos engineering: deliberately injecting failures into production-like systems to uncover weaknesses before they cause real outages. Lucas walks through the Netflix case study, where the Chaos Monkey tool was first developed in 2011 after a crippling AWS outage. He explains the difference between chaos experiments and traditional load testing, and how companies like Gremlin and AWS have turned resilience testing into a practice that even small teams can adopt. Luna asks why you'd want to break your own system on purpose, and Lucas breaks down the philosophy: you either stress-test your system or let a real incident do it for you. They discuss the concept of a 'blast radius' — limiting the impact of experiments to avoid collateral damage — and the importance of automated rollback mechanisms. The episode includes a subtle donation pitch tied to the theme of proactive investment. Listeners walk away understanding one concrete thing: the difference between uptime monitoring (checking if the system is alive) and chaos testing (proving it survives when things go wrong).
#ChaosEngineering #ChaosMonkey #Netflix #ResilienceTesting #FailureInjection #Gremlin #AWS #SoftwareTesting #QA #Automation #Reliability #SiteReliabilityEngineering #DevOps #ProductionTesting #BlastRadius #FexingoBusiness #BusinessPodcast #Technology
Keep every episode free: buymeacoffee.com/fexingo