
Sign up to save your podcasts
Or


Tech Talks are in-depth technical discussions.
As a system becomes more complex, the chance of failure increases. At a large enough scale, failures are inevitable. Incident response is the practice of preparing for and effectively recovering from these failures.
An engineering team can use checklists and runbooks to minimize failures. They can put a plan in place for responding to failures. And they can use the process of post mortems to reflect on a failure and take full advantage of the lessons of that failure.
Emil Stolarsky is a production engineer at Shopify where his role shares many similarities with that of Google's site reliability engineers. In this episode, Emil argues that the academic study of emergency management and industries such as aerospace and transportation have a lot to teach software engineers about responding to production problems.
In this interview Emil argues that we need to move beyond tribal knowledge and incorporate practices such as an incident command system and rigorous use of checklists. Emil suggests that we need to move beyond a mindset of "move fast and break things" and toward a place of more deliberate preparation.
By Adam Gordon Bell - Software Developer4.9
188188 ratings
Tech Talks are in-depth technical discussions.
As a system becomes more complex, the chance of failure increases. At a large enough scale, failures are inevitable. Incident response is the practice of preparing for and effectively recovering from these failures.
An engineering team can use checklists and runbooks to minimize failures. They can put a plan in place for responding to failures. And they can use the process of post mortems to reflect on a failure and take full advantage of the lessons of that failure.
Emil Stolarsky is a production engineer at Shopify where his role shares many similarities with that of Google's site reliability engineers. In this episode, Emil argues that the academic study of emergency management and industries such as aerospace and transportation have a lot to teach software engineers about responding to production problems.
In this interview Emil argues that we need to move beyond tribal knowledge and incorporate practices such as an incident command system and rigorous use of checklists. Emil suggests that we need to move beyond a mindset of "move fast and break things" and toward a place of more deliberate preparation.

273 Listeners

381 Listeners

288 Listeners

624 Listeners

583 Listeners

291 Listeners

44 Listeners

990 Listeners

243 Listeners

64 Listeners

139 Listeners

74 Listeners

67 Listeners

98 Listeners

74 Listeners