January 04, 2021

SRE Introduction

23 minutes

This week we read the forward, preface, and the first two chapters of Site Reliability Engineering. We discuss the origins and basic tenants of SRE, look at how Google manages risk, and think about how we can incorporate SRE into our work. You can join our free discussions Thursdays at 7 pm Eastern by signing up at https://www.bookclub.dev/thursdays.

Resources

The Wheel of Time Series (Amazon)
Awareness: The Perils and Opportunities of Reality (Amazon)
SRE Book companion site
Principles of Network and System Administration (Amazon)
Practical Reliability Engineering (Amazon)
Facts and Fallacies of Software Engineering (Amazon)
The Factors That Impact Availability, Visualized
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition
A Study of Non-Blocking Switching Networks
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network
B4: Experience with a Globally-Deployed Software Defined WAN
BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing
Large-scale cluster management at Google with Borg
MapReduce: Simplified Data Processing on Large Clusters
The Google File System
Bigtable: A Distributed Storage System for Structured Data
Spanner: Google’s Globally-Distributed Database
The Chubby Lock Service for Loosely-Coupled Distributed Systems
Searching for Build Debt: Experiences Managing Technical Debt at Google
The Motivation for a Monolithic Codebase: Why Google stores billions of lines of code in a single repository
Borg, Omega, and Kubernetes

...more

View all episodes

By Dan Cook

January 04, 2021

SRE Introduction

23 minutes

Resources

The Wheel of Time Series (Amazon)
Awareness: The Perils and Opportunities of Reality (Amazon)
SRE Book companion site
Principles of Network and System Administration (Amazon)
Practical Reliability Engineering (Amazon)
Facts and Fallacies of Software Engineering (Amazon)
The Factors That Impact Availability, Visualized
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition
A Study of Non-Blocking Switching Networks
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network
B4: Experience with a Globally-Deployed Software Defined WAN
BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing
Large-scale cluster management at Google with Borg
MapReduce: Simplified Data Processing on Large Clusters
The Google File System
Bigtable: A Distributed Storage System for Structured Data
Spanner: Google’s Globally-Distributed Database
The Chubby Lock Service for Loosely-Coupled Distributed Systems
Searching for Build Debt: Experiences Managing Technical Debt at Google
The Motivation for a Monolithic Codebase: Why Google stores billions of lines of code in a single repository
Borg, Omega, and Kubernetes

...more

Share SRE Introduction

Sign up to save your podcasts

SRE Introduction

SRE Introduction