
Sign up to save your podcasts
Or


Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.
You will learn:
How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments
Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints
Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests
Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8
Interested in sponsoring an episode? Learn more.
By KubeFM5
22 ratings
Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.
You will learn:
How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments
Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints
Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests
Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8
Interested in sponsoring an episode? Learn more.

274 Listeners

288 Listeners

2,009 Listeners

631 Listeners

276 Listeners

153 Listeners

583 Listeners

287 Listeners

44 Listeners

167 Listeners

179 Listeners

206 Listeners

62 Listeners

98 Listeners

68 Listeners