
Sign up to save your podcasts
Or
In this episode Lexiang Huang talks about a framework for understanding a class of failures in distributed systems called metastable failures. Lexiang tells us about his study on the prevalence of such failures in the wild and how he and his colleagues scoured over publicly available incident reports from many organizations, ranging from hyperscalers to small companies. Listen to the episode to find out about his main findings and gain a deeper understanding of metastable failures and how you can identity, prevent, and mitigate against them!
Hosted on Acast. See acast.com/privacy for more information.
5
66 ratings
In this episode Lexiang Huang talks about a framework for understanding a class of failures in distributed systems called metastable failures. Lexiang tells us about his study on the prevalence of such failures in the wild and how he and his colleagues scoured over publicly available incident reports from many organizations, ranging from hyperscalers to small companies. Listen to the episode to find out about his main findings and gain a deeper understanding of metastable failures and how you can identity, prevent, and mitigate against them!
Hosted on Acast. See acast.com/privacy for more information.
284 Listeners
621 Listeners
111,864 Listeners
47 Listeners
28 Listeners
18 Listeners
491 Listeners