The full story behind a highly successful immune system we implemented for the "New Google Sites". We will discuss key decisions and present the design of the system that we built, touching a wide range of devops topics: Production tests Webdriver vs. RPC probers What environments we have and how are they pushed (CI/CD, etc.) Capacity testing, load testing, Canarying What worked, what went wrong, and what were the surprising wins Blackbox vs. whitebox alerting Capacity planning and request costs What the oncaller dashboard looks like Postmortem culture MP3