
Sign up to save your podcasts
Or


In this episode Lexiang Huang talks about a framework for understanding a class of failures in distributed systems called metastable failures. Lexiang tells us about his study on the prevalence of such failures in the wild and how he and his colleagues scoured over publicly available incident reports from many organizations, ranging from hyperscalers to small companies. Listen to the episode to find out about his main findings and gain a deeper understanding of metastable failures and how you can identity, prevent, and mitigate against them!
Hosted on Acast. See acast.com/privacy for more information.
By Jack Waudby5
66 ratings
In this episode Lexiang Huang talks about a framework for understanding a class of failures in distributed systems called metastable failures. Lexiang tells us about his study on the prevalence of such failures in the wild and how he and his colleagues scoured over publicly available incident reports from many organizations, ranging from hyperscalers to small companies. Listen to the episode to find out about his main findings and gain a deeper understanding of metastable failures and how you can identity, prevent, and mitigate against them!
Hosted on Acast. See acast.com/privacy for more information.

290 Listeners

623 Listeners

181 Listeners

5,471 Listeners

30 Listeners

133 Listeners