March 20, 2026

The Kubernetes AI Pattern That Cuts GPU Costs

23 minutes

**87% of AI workloads are sitting idle on GPUs right now** - yet companies keep buying more hardware. What if the problem isn't capacity, but how we're running AI on Kubernetes?

In today's Platform Engineering Playbook, we tackle the massive inefficiencies plaguing AI infrastructure at scale. You'll discover why traditional Kubernetes patterns break down with AI workloads, what's actually happening under the hood when you try to serve ML models in production, and concrete strategies to fix GPU utilization without throwing more money at the problem.

**What You'll Learn:**

• Why current Kubernetes-native AI patterns are failing at scale

• The hidden bottlenecks destroying your GPU efficiency

• Runtime security developments from Grafana Labs and Miggo

• Amazon ECR's new pull-through cache support for Chainguard

• How to evolve from Kubernetes Gatekeeper to full-stack governance with OPA

**Timestamps:**

0:00 Cold Open - The AI Infrastructure Crisis

2:15 Today's Platform Engineering News

8:30 Deep Dive: Kubernetes + AI at Scale

15:45 Under the Hood Analysis

22:10 Actionable Takeaways

Whether you're scaling AI workloads or just trying to understand why your GPU bills keep growing while performance stays flat, this episode gives you the platform engineering perspective you need.

**Sources & References:**

• Building Kubernetes-native AI infrastructure: https://thenewstack.io/kubernetes-native-ai-infrastructure/

• Grafana Cloud and Miggo runtime protection: https://grafana.com/blog/grafana-cloud-and-miggo-for-runtime-protection/

• Amazon ECR Chainguard support: https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-ecr-pull-through-cache-chainguard/

• AWS Cloud 20 years retrospective: https://aws.amazon.com/blogs/aws/20-years-in-the-aws-cloud-how-time-flies/

• LLM Compressor v0.10: https://developers.redhat.com/articles/2026/03/18/llm-compressor-010-faster-compression-distributed-gptq

• Kubernetes Gatekeeper to OPA governance: https://www.pulumi.com/blog/kubernetes-gatekeeper-full-stack-governance-opa/

#PlatformEngineering #DevOps #CloudNative #Kubernetes

...more

View all episodes

By vibesre

March 20, 2026

The Kubernetes AI Pattern That Cuts GPU Costs

23 minutes

**87% of AI workloads are sitting idle on GPUs right now** - yet companies keep buying more hardware. What if the problem isn't capacity, but how we're running AI on Kubernetes?

**What You'll Learn:**

• Why current Kubernetes-native AI patterns are failing at scale

• The hidden bottlenecks destroying your GPU efficiency

• Runtime security developments from Grafana Labs and Miggo

• Amazon ECR's new pull-through cache support for Chainguard

• How to evolve from Kubernetes Gatekeeper to full-stack governance with OPA

**Timestamps:**

0:00 Cold Open - The AI Infrastructure Crisis

2:15 Today's Platform Engineering News

8:30 Deep Dive: Kubernetes + AI at Scale

15:45 Under the Hood Analysis

22:10 Actionable Takeaways

Whether you're scaling AI workloads or just trying to understand why your GPU bills keep growing while performance stays flat, this episode gives you the platform engineering perspective you need.

**Sources & References:**

• Building Kubernetes-native AI infrastructure: https://thenewstack.io/kubernetes-native-ai-infrastructure/

• Grafana Cloud and Miggo runtime protection: https://grafana.com/blog/grafana-cloud-and-miggo-for-runtime-protection/

• Amazon ECR Chainguard support: https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-ecr-pull-through-cache-chainguard/

• AWS Cloud 20 years retrospective: https://aws.amazon.com/blogs/aws/20-years-in-the-aws-cloud-how-time-flies/

• LLM Compressor v0.10: https://developers.redhat.com/articles/2026/03/18/llm-compressor-010-faster-compression-distributed-gptq

• Kubernetes Gatekeeper to OPA governance: https://www.pulumi.com/blog/kubernetes-gatekeeper-full-stack-governance-opa/

#PlatformEngineering #DevOps #CloudNative #Kubernetes

...more

Share The Kubernetes AI Pattern That Cuts GPU Costs

Sign up to save your podcasts

The Kubernetes AI Pattern That Cuts GPU Costs

The Kubernetes AI Pattern That Cuts GPU Costs