Ship It Weekly - DevOps, SRE, and Platform Engineering News

Kubernetes Shake-ups, Platform Reality, and AI-Native SRE


Listen Later

In this episode of Ship It Weekly, Brian digs into 3 big themes for anyone running Kubernetes or building internal platforms.

First, Kubernetes is officially retiring Ingress NGINX and moving it into best-effort maintenance until March 2026. We talk about what that actually means if you’re still using it and how to think about choosing and rolling out a replacement ingress.

Second, we look at how CNCF is defining platform engineering and what “platform as a product” looks like in practice, plus some hard-earned lessons from running Kubernetes in production.

Third, we talk about AI as a first-class workload on Kubernetes. CNCF’s new Certified Kubernetes AI Conformance Program aims to standardize how AI runs on K8s, and recent writing on SRE in the age of AI looks at what reliability means when systems learn and drift.

In the lightning round, we hit good reads on database migrations, Postgres upgrades, and a distributed priority queue on Kafka. We wrap with the human side of incidents: fixation during incident response and using incidents as landmarks for the tradeoffs you’ve been making over time.

If you’re on a platform team, responsible for SLOs, or the person people ping when “Kubernetes is weird,” this one should give you concrete questions to take back to your roadmap and runbooks.

Links from this episode

https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/

https://www.haproxy.com/blog/ingress-nginx-is-retiring

https://www.cncf.io/blog/2025/11/19/what-is-platform-engineering/

https://www.cncf.io/announcements/2025/11/11/cncf-launches-certified-kubernetes-ai-conformance-program-to-standardize-ai-workloads-on-kubernetes/

https://devops.com/sre-in-the-age-of-ai-what-reliability-looks-like-when-systems-learn/

Lightning round

https://www.cncf.io/blog/2025/11/18/top-5-hard-earned-lessons-from-the-experts-on-managing-kubernetes/

https://www.tines.com/blog/zero-downtime-database-migrations-lessons-from-moving-a-live-production

https://palark.com/blog/postgresql-upgrade-no-data-loss-downtime/

https://klaviyo.tech/building-a-distributed-priority-queue-in-kafka-1b2d8063649e

https://sreweekly.com/sre-weekly-issue-497/

https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html

...more
View all episodesView all episodes
Download on the App Store

Ship It Weekly - DevOps, SRE, and Platform Engineering NewsBy Teller's Tech - DevOps SRE Podcast

  • 5
  • 5
  • 5
  • 5
  • 5

5

4 ratings