Code[ish]

Chaos Engineering


Listen Later

Rick Newman interviews Mikolaj Pawlikowski, who recently wrote a book called "Chaos Engineering: Crash test your applications." The theory behind chaos engineering is to "break things on purpose" in your operational flow. You want to deliberately inject failures that might occur in production ahead of time, in order to anticipate them, and thus implement workarounds and corrections. Typically, this practice is often used for large, distributed systems, because of the many points of failure, but it can be useful in any architecture.

One of the obstacles to embracing chaos engineering is finding high level approval from other teammates, or even managers. Even after the feature is a complete and the unit tests are passing, it can be difficult to convince someone that some resiliency work needs to continue, because there's no visible or tangible benefit to preparing for a disaster. Mikolaj suggests that people clearly lay out to other colleagues the ways a system can fail, and the impact it can have on the application or business. Rather than try to fear monger, it can be useful to point to other companies' availability issues as words of caution for their teams to embrace. Mikolaj also says that chaos engineering doesn't need to focus solely on complicated problems like race conditions across distributed systems. Often, there's enough low hanging fruit, such as disk space running out or an API timing out, that can be useful to consider fixing.

The chaos engineering mindset can also extend beyond pure software. If you think about people working across different timezones as a distributed system, you can also optimize for failures in communication before they occur. Everyone works at a different pace, and communication issues can be analogous to a network loss. Rather than fix miscommunications after they occur, establishing shared practices (like writing down every meeting, or setting up playbooks) can go a long way to ensuring that everyone will be able to do their best under changing circumstances.

Links from this episode
  • Mikolaj's book is called Chaos Engineering: Crash test your applications -- get a 40% discount using the code podish19
  • powerfulseal is a testing tool for Kubernetes clusters
  • Mikolaj distributes the Chaos Engineering Newsletter
  • Conf42 is a conference focusing on high-level computer science
  • ChaosConf is the world’s largest Chaos Engineering event
  • Awesome Chaos Engineering is a curated list of Chaos Engineering resources
...more
View all episodesView all episodes
Download on the App Store

Code[ish]By Heroku from Salesforce

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

18 ratings


More shows like Code[ish]

View all
TED Radio Hour by NPR

TED Radio Hour

22,006 Listeners

Planet Money by NPR

Planet Money

30,665 Listeners

Global News Podcast by BBC World Service

Global News Podcast

7,689 Listeners

Economist Podcasts by The Economist

Economist Podcasts

4,174 Listeners

This Week in Startups by Jason Calacanis

This Week in Startups

1,283 Listeners

Accidental Tech Podcast by Marco Arment, Casey Liss, John Siracusa

Accidental Tech Podcast

2,126 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Science Vs by Spotify Studios

Science Vs

12,192 Listeners

The Salesforce Admins Podcast by Mike Gerholdt

The Salesforce Admins Podcast

206 Listeners

The Daily by The New York Times

The Daily

112,433 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

987 Listeners

The Diary Of A CEO with Steven Bartlett by DOAC

The Diary Of A CEO with Steven Bartlett

8,410 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,010 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,828 Listeners

Plain English with Derek Thompson by The Ringer

Plain English with Derek Thompson

2,284 Listeners