Three Devs and a Maybe

148: Site Reliability Engineering with Niall Murphy


Listen Later

In this week’s episode we are lucky to be joined by Niall Murphy to discuss the discipline of Site Reliability Engineering.

We start off by speaking about how he got into computing, how the SRE role came to be and what drew him to it.
From here, we highlight the position of an SRE within a company/group, what SLA’s are, the positives of having 50% operations work caps and blameless postmortems.
This leads us to talk about the reasoning behind striving for 100% uptime is actually detrimental to the product, and the benefits of having an Error Budget.
Finally, we discuss how the role has evolved since its inception, the Wheel of Misfortune and what drew him to contribute to the seminal SRE book.

Show Links
  • Niall Murphy on Twitter
  • Fourth Doctor - Wikipedia
  • Niall Murphy - Research at Google
  • The History of the Irish Internet
  • Google - Site Reliability Engineering Book
  • How SRE relates to DevOps
  • Keys to SRE - YouTube
  • Hyrum’s Law
  • Prometheus - Monitoring system and time series database
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Three Devs and a MaybeBy Michael Budd, Fraser Hart, Lewis Cains, Edd Mann

    • 4.6
    • 4.6
    • 4.6
    • 4.6
    • 4.6

    4.6

    11 ratings