Inference Time Tactics

From GPU Scarcity to GPU Waste: Solving the Utilization Crisis


Listen Later

In this episode of Inference Time Tactics, Cooper and Byron sit down with Charlie and Anil from Rapt AI to tackle one of the industry's most expensive problems: GPU underutilization. With half a trillion dollars invested in GPU infrastructure running at just 20-30% utilization, Rapt AI is building AI-powered orchestration that automatically analyzes workloads and matches them to the right compute resources—no guesswork required.

 

We talked about:

 

  • Why half a trillion dollars in GPU infrastructure runs at only 20-30% utilization—and how a 5% drop costs $200,000 per $2M investment. 
  • How Rapt AI's platform continuously analyzes workloads and auto-optimizes GPU allocation, letting customers run 4-14 models per GPU. 
  • Real results: moving workloads from H100s to A100s at 40% of the cost, and reducing GPU footprints from 184 to under 50 while improving performance. 
  • Why 2026 becomes the year of inference as agentic workloads create unprecedented infrastructure chaos. 
  • The shift from supply problems to optimization problems—and why abstraction layers matter across multi-vendor environments. 
  • Power as the next crisis: tokens-per-watt emerging as the critical metric alongside tokens-per-dollar. 
  • How intelligent orchestration frees up data scientists and ML ops teams from infrastructure tuning to focus on AI innovation.


  • Connect with Rapt AI:

    Website: https://www.rapt.ai/ 

    LinkedIn (Anil Ravindranath): https://www.linkedin.com/in/anilravindranath 

    LinkedIn (Charlie Leeming): https://www.linkedin.com/in/charlieleeming/ 



    Connect with Neurometric:

    Website: https://www.neurometric.ai/ 

    Substack: https://neurometric.substack.com/ 

    X: https://x.com/neurometric/ 

    Bluesky: https://bsky.app/profile/neurometric.bsky.social



    Hosts:

    Calvin Cooper

    https://x.com/cooper_nyc_ 

    https://www.linkedin.com/in/coopernyc

     

    Byron Galbraith

    https://x.com/bgalbraith 

    https://www.linkedin.com/in/byrongalbraith

    ...more
    View all episodesView all episodes
    Download on the App Store

    Inference Time TacticsBy NeuroMetric AI