Intellectually Curious

Goodput, Not Just Throughput: Prefill, Decode, and Rethinking AI Inference


Listen Later

We unpack the core bottleneck in streaming AI: the split between heavy pre-fill computations and fast, memory-light decoding. From chunked prefill to physical separation (DissServe) and logical isolation (DuetServe), we explore how phase isolation eliminates interference, delivering 2x–4.5x better goodput and transforming cost efficiency. Join us as we translate GPU architecture ideas into scalable, user-friendly AI services, with practical takeaways for builders, operators, and decision-makers.


Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

...more
View all episodesView all episodes
Download on the App Store

Intellectually CuriousBy Mike Breault