
Sign up to save your podcasts
Or


Discover how to build resilient Kubernetes environments at scale with practical automation strategies from an engineer who's tackled complex production challenges.
Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, shares his team's journey developing a comprehensive self-healing framework. He explains how they addressed issues ranging from spot node preemptions to network packet drops caused by unbalanced IRQs, providing concrete examples of automation that prevents downtime and improves reliability.
You will learn:
How managed Kubernetes services like AKS provide benefits but require customization for specific use cases
The architecture of an effective self-healing framework using DaemonSets and deployments with Kubernetes-native components
Practical solutions for common challenges like StatefulSet pods stuck on unreachable nodes and cleaning up orphaned pods
Techniques for workload-level automation, including throttling CPU-hungry pods and automating diagnostic data collection
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/yg_fkP0LN
Interested in sponsoring an episode? Learn more.
By KubeFM5
22 ratings
Discover how to build resilient Kubernetes environments at scale with practical automation strategies from an engineer who's tackled complex production challenges.
Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, shares his team's journey developing a comprehensive self-healing framework. He explains how they addressed issues ranging from spot node preemptions to network packet drops caused by unbalanced IRQs, providing concrete examples of automation that prevents downtime and improves reliability.
You will learn:
How managed Kubernetes services like AKS provide benefits but require customization for specific use cases
The architecture of an effective self-healing framework using DaemonSets and deployments with Kubernetes-native components
Practical solutions for common challenges like StatefulSet pods stuck on unreachable nodes and cleaning up orphaned pods
Techniques for workload-level automation, including throttling CPU-hungry pods and automating diagnostic data collection
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/yg_fkP0LN
Interested in sponsoring an episode? Learn more.

271 Listeners

289 Listeners

2,011 Listeners

626 Listeners

268 Listeners

153 Listeners

585 Listeners

289 Listeners

43 Listeners

164 Listeners

182 Listeners

202 Listeners

64 Listeners

98 Listeners

62 Listeners