The Private AI Lab

013 - AI Resource Management Update & Tools with Frank Denneman


Listen Later

In this episode of The Private AI Lab, Frank Denneman returns as the first recurring guest to go deeper into one of the most misunderstood challenges in AI:


👉 Resource management for GPU workloads


Building on our previous conversation, this episode shifts from why it matters to how to actually design it right.

We dive into real-world challenges like GPU fragmentation, siloed capacity, and why traditional infrastructure thinking breaks down when AI enters the data center. Frank shares practical insights from his latest research, blog series, and tools—helping architects and platform engineers understand how to design efficient, scalable AI environments.


🔍 What you’ll learn in this episode


  • Why GPU workloads behave fundamentally differently from CPU/memory workloads

  • What GPU fragmentation really is (and why it kills utilization)

  • The difference between same-size vs mixed-mode placement

  • How placement IDs turn GPU scheduling into “Tetris”

  • Why “right-sizing” beats “perfect fitting” in AI environments

  • How to design a GPU profile catalog that actually scales

  • The role of state, agents, and storage in next-gen AI platforms


🔧 Tools & Resources mentioned


Frank created practical tools to help you design and validate your GPU environments:


  • 👉 vGPU Silo Capacity Calculator

    https://frankdenneman.ai/tools/vgpu-silo-capacity-calculator/

  • 👉 Same-size vs Mixed-mode Placement Tool

    https://frankdenneman.ai/tools/same-size-vs-mixed-mode/

  • 👉 Deep dive on unified memory & modern AI workloads

    https://frankdenneman.ai/posts/2026-03-23-understanding-unified-memory-dgx-spark-nemoclaw-nemotron/


Chapters:

00:00 Intro — Frank Denneman returns

01:30 AI hype vs real engineering

03:00 DGX Spark, NemoClaw & local AI agents

10:30 From LLMs to agents & stateful systems

12:00 Why AI infrastructure is different

15:00 What is GPU fragmentation?

19:30 Same-size vs mixed-mode placement

23:00 GPU “Tetris” and placement IDs explained

27:00 Right-sizing vs perfect fitting

32:00 The tools: capacity & placement simulation

36:00 GPU silos vs stranded capacity

41:00 Model sizing, KV cache & dynamic usage

48:00 Future of AI: smaller models & orchestration

55:00 AI-assisted coding & real-world impact

59:00 Key lessons learned

01:02:00 Closing thoughts




...more
View all episodesView all episodes
Download on the App Store

The Private AI LabBy Johan van Amersfoort