Share Karpenter Lifecycle: How GPU Pods Get Unstuck

Copy link

January 26, 2026

Karpenter Lifecycle: How GPU Pods Get Unstuck

39 minutes

A pending ML training job needing 8 GPUs is a classic Karpenter interview scenario — here's the exact four-step lifecycle an interviewer expects you to walk through.

You'll learn:

Why the K8s scheduler marks pods unschedulable and how Karpenter's controller watches for that signal

How Karpenter evaluates all pod constraints at once — resource requests, nodeSelector, nodeAffinity, tolerations, and topology spread

How it calls the EC2 API to select the right instance (p3.16xlarge for 8 GPUs) in the correct availability zone

Why Karpenter provisions the node but the K8s scheduler still does the final pod binding — a gotcha that trips up a lot of candidates

Keywords: Karpenter node provisioning, Kubernetes GPU scheduling, pending pods interview question, Karpenter vs cluster autoscaler, K8s scheduler lifecycle

🎧 Listen, then go deeper — DevOps & Cloud interview-prep ebooks at DevOpsInterview.Cloud

...more

View all episodes

By https://DevOpsInterview.Cloud