Platform Engineering Playbook Podcast

KubeCon Atlanta 2025 Part 1: AI Goes Native and the 30K Core Lesson


Listen Later

Google donates a GPU driver live on stage. OpenAI saves $2.16M/month with one line of code. Kubernetes rollback finally works after 10 years. What changed at KubeCon Atlanta 2025 that proves Kubernetes isn't adapting to AI—it's being rebuilt for it?

This is Part 1 of our three-part deep dive into KubeCon Atlanta 2025 (November 12-21). Over three episodes, we're covering the CNCF's 10-year anniversary, the announcements reshaping platform engineering, and the honest conversations about ecosystem sustainability.

Key Topics Covered:

• Dynamic Resource Allocation (DRA) reaches GA in Kubernetes 1.34 - prevents 10-40% GPU performance loss from NUMA misalignment ($200K/day waste at 100-node scale)
• CPU DRA driver announced - enables Kubernetes + Slurm integration for HPC workloads (computational fluid dynamics, molecular modeling, financial simulations)
• Workload API arrives in alpha for gang-scheduling multi-pod AI training jobs - eliminates partial failure waste
• OpenAI freed 30,000 CPU cores ($2.16M/month savings) by disabling inotify in Fluent Bit after profiling revealed 35% CPU time on fstat64
• Kubernetes rollback achieves 99.99% success rate after 10 years - skip-version upgrades now supported

Tomorrow in Part 2: Platform engineering reaches consensus on three principles, real-world case studies from Intuit/Bloomberg/ByteDance, and the "puppy for Christmas" anti-pattern.

Monday Action Plan:

1. Test Kubernetes 1.34 in development with DRA enabled
2. Profile your highest-CPU service with perf or eBPF (spend 30 minutes)
3. Check for NUMA misalignment in GPU workloads

Full Episode Page: https://platformengineeringplaybook.com/podcasts/00035-kubecon-2025-ai-native

Read the Complete Blog Post: https://platformengineeringplaybook.com/blog/2025/11/24/kubecon-atlanta-2025-recap

Part of the Platform Engineering Playbook Podcast series. Open source, community-driven content for senior platform engineers, SREs, and DevOps engineers.

...more
View all episodesView all episodes
Download on the App Store

Platform Engineering Playbook PodcastBy vibesre