April 16, 2026

013 - AI Resource Management Update & Tools with Frank Denneman

1 hour 3 minutes

In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:

👉 Resource management for GPU workloads

Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.

We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.

🔍 What you’ll learn in this episode

Why GPU workloads behave fundamentally differently from CPU/memory workloads
What GPU fragmentation really is (and why it kills utilization)
The difference between same-size vs mixed-mode placement
How placement IDs turn GPU scheduling into “Tetris”
Why “right-sizing” beats “perfect fitting” in AI environments
How to design a GPU profile catalog that actually scales
The role of state, agents, and storage in next-gen AI platforms

🔧 Tools & Resources mentioned

Frank created practical tools to help you design and validate your GPU environments:

👉 vGPU Silo Capacity Calculator
https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/
👉 Same-size vs Mixed-mode Placement Tool
https://frankdenneman.ai/tools/same-size-vs-mixed-mode/
👉 Deep dive on unified memory & modern AI workloads
https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/

Chapters:

00:00 Intro — Frank Denneman returns

01:30 AI hype vs real engineering

03:00 DGX Spark, NemoClaw & local AI agents

10:30 From LLMs to agents & stateful systems

12:00 Why AI infrastructure is different

15:00 What is GPU fragmentation?

19:30 Same-size vs mixed-mode placement

23:00 GPU “Tetris” and placement IDs explained

27:00 Right-sizing vs perfect fitting

32:00 The tools: capacity & placement simulation

36:00 GPU silos vs stranded capacity

41:00 Model sizing, KV cache & dynamic usage

48:00 Future of AI: smaller models & orchestration

55:00 AI-assisted coding & real-world impact

59:00 Key lessons learned

01:02:00 Closing thoughts

...more

View all episodes

By Johan van Amersfoort

April 16, 2026

013 - AI Resource Management Update & Tools with Frank Denneman

1 hour 3 minutes

In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:

👉 Resource management for GPU workloads

Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.

🔍 What you’ll learn in this episode

Why GPU workloads behave fundamentally differently from CPU/memory workloads
What GPU fragmentation really is (and why it kills utilization)
The difference between same-size vs mixed-mode placement
How placement IDs turn GPU scheduling into “Tetris”
Why “right-sizing” beats “perfect fitting” in AI environments
How to design a GPU profile catalog that actually scales
The role of state, agents, and storage in next-gen AI platforms

🔧 Tools & Resources mentioned

Frank created practical tools to help you design and validate your GPU environments:

👉 vGPU Silo Capacity Calculator
https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/
👉 Same-size vs Mixed-mode Placement Tool
https://frankdenneman.ai/tools/same-size-vs-mixed-mode/
👉 Deep dive on unified memory & modern AI workloads
https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/

Chapters:

00:00 Intro — Frank Denneman returns

01:30 AI hype vs real engineering

03:00 DGX Spark, NemoClaw & local AI agents

10:30 From LLMs to agents & stateful systems

12:00 Why AI infrastructure is different

15:00 What is GPU fragmentation?

19:30 Same-size vs mixed-mode placement

23:00 GPU “Tetris” and placement IDs explained

27:00 Right-sizing vs perfect fitting

32:00 The tools: capacity & placement simulation

36:00 GPU silos vs stranded capacity

41:00 Model sizing, KV cache & dynamic usage

48:00 Future of AI: smaller models & orchestration

55:00 AI-assisted coding & real-world impact

59:00 Key lessons learned

01:02:00 Closing thoughts

...more

Share 013 - AI Resource Management Update & Tools with Frank Denneman

Sign up to save your podcasts

013 - AI Resource Management Update & Tools with Frank Denneman

013 - AI Resource Management Update & Tools with Frank Denneman