
Sign up to save your podcasts
Or


In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:
👉 Resource management for GPU workloads
Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.
We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.
🔍 What you’ll learn in this episode
Why GPU workloads behave fundamentally differently from CPU/memory workloads
What GPU fragmentation really is (and why it kills utilization)
The difference between same-size vs mixed-mode placement
How placement IDs turn GPU scheduling into “Tetris”
Why “right-sizing” beats “perfect fitting” in AI environments
How to design a GPU profile catalog that actually scales
The role of state, agents, and storage in next-gen AI platforms
🔧 Tools & Resources mentioned
Frank created practical tools to help you design and validate your GPU environments:
👉 vGPU Silo Capacity Calculator
https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/
👉 Same-size vs Mixed-mode Placement Tool
https://frankdenneman.ai/tools/same-size-vs-mixed-mode/
👉 Deep dive on unified memory & modern AI workloads
https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/
Chapters:
00:00 Intro — Frank Denneman returns
01:30 AI hype vs real engineering
03:00 DGX Spark, NemoClaw & local AI agents
10:30 From LLMs to agents & stateful systems
12:00 Why AI infrastructure is different
15:00 What is GPU fragmentation?
19:30 Same-size vs mixed-mode placement
23:00 GPU “Tetris” and placement IDs explained
27:00 Right-sizing vs perfect fitting
32:00 The tools: capacity & placement simulation
36:00 GPU silos vs stranded capacity
41:00 Model sizing, KV cache & dynamic usage
48:00 Future of AI: smaller models & orchestration
55:00 AI-assisted coding & real-world impact
59:00 Key lessons learned
01:02:00 Closing thoughts
By Johan van AmersfoortIn this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:
👉 Resource management for GPU workloads
Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.
We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.
🔍 What you’ll learn in this episode
Why GPU workloads behave fundamentally differently from CPU/memory workloads
What GPU fragmentation really is (and why it kills utilization)
The difference between same-size vs mixed-mode placement
How placement IDs turn GPU scheduling into “Tetris”
Why “right-sizing” beats “perfect fitting” in AI environments
How to design a GPU profile catalog that actually scales
The role of state, agents, and storage in next-gen AI platforms
🔧 Tools & Resources mentioned
Frank created practical tools to help you design and validate your GPU environments:
👉 vGPU Silo Capacity Calculator
https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/
👉 Same-size vs Mixed-mode Placement Tool
https://frankdenneman.ai/tools/same-size-vs-mixed-mode/
👉 Deep dive on unified memory & modern AI workloads
https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/
Chapters:
00:00 Intro — Frank Denneman returns
01:30 AI hype vs real engineering
03:00 DGX Spark, NemoClaw & local AI agents
10:30 From LLMs to agents & stateful systems
12:00 Why AI infrastructure is different
15:00 What is GPU fragmentation?
19:30 Same-size vs mixed-mode placement
23:00 GPU “Tetris” and placement IDs explained
27:00 Right-sizing vs perfect fitting
32:00 The tools: capacity & placement simulation
36:00 GPU silos vs stranded capacity
41:00 Model sizing, KV cache & dynamic usage
48:00 Future of AI: smaller models & orchestration
55:00 AI-assisted coding & real-world impact
59:00 Key lessons learned
01:02:00 Closing thoughts