
Sign up to save your podcasts
Or


"We no longer felt confident about what the exact operational boundaries of our cluster were supposed to be."
In early 2021, observability company Honeycomb dealt with a series of outages related to their Kafka architectural migration, culminating in a 12-hour incident, which is an extremely long outage for the company. In this episode, we chat with two engineers involved in these incidents, Liz Fong-Jones and Fred Hebert, about the backstory that is summarized in this meta-analysis they published in May.
We cover a wide range of topics beyond the specific technical details of the incident (which we also discuss), including:
Resources mentioned in the episode:
Published in partnership with Indeed.
By Courtney Nash"We no longer felt confident about what the exact operational boundaries of our cluster were supposed to be."
In early 2021, observability company Honeycomb dealt with a series of outages related to their Kafka architectural migration, culminating in a 12-hour incident, which is an extremely long outage for the company. In this episode, we chat with two engineers involved in these incidents, Liz Fong-Jones and Fred Hebert, about the backstory that is summarized in this meta-analysis they published in May.
We cover a wide range of topics beyond the specific technical details of the incident (which we also discuss), including:
Resources mentioned in the episode:
Published in partnership with Indeed.