What's Up with Tech?

Inside AMD’s AI Strategy From Edge To Data Center


Listen Later

Interested in being a guest? Email us at [email protected]

Big leaps in AI rarely come from one breakthrough. They emerge when hardware design, open software, and real workloads click into place. That’s the story we unpack with AMD’s Ramine Roane: how an open, developer-first approach combined with high-bandwidth memory, chiplet packaging, and a re-architected software stack is reshaping performance and cost from the edge to the largest data centers.

We walk through why memory capacity and bandwidth dominate large language model performance, and how MI300X’s 192 GB HBM and advanced packaging unlock bigger contexts and faster token throughput. Ramin explains how Rocm 7 was rebuilt to be modular, smaller to install, and enterprise-ready—so teams can go from single-node experiments to fully orchestrated clusters using Kubernetes, Slurm, and familiar open tools. The highlight: disaggregated and distributed inference. By splitting prefill from decode and adopting expert parallelism, organizations are slashing cost per token by 10–30x, depending on model and topology.

The conversation ranges from startup-friendly workflows to hyperscaler deployments, with practical insight into VLLM, SGLang, and why open source now outpaces closed stacks. We also look ahead at where inference runs: the edge is rising. With performance per watt doubling on a steady cadence, AI PCs, laptops, and phones will take on more of the work, enabling privacy, responsiveness, and lower costs. Ramin shares a sober view on quantum computing timelines and a bullish take on the broader compute shift—moving once-sequential problems into massively parallel deep learning that changes what’s even possible.

If you care about real performance, total cost of ownership, and developer velocity, this conversation brings a grounded blueprint: open ecosystems, smarter packaging, and inference architectures built for high utilization. Subscribe, share with a colleague who cares about LLM throughput and cost, and leave a quick review to help others find the show.

PodMatch
PodMatch Automatically Matches Ideal Podcast Guests and Hosts For Interviews

Support the show

More at https://linktr.ee/EvanKirstel

...more
View all episodesView all episodes
Download on the App Store

What's Up with Tech?By Evan Kirstel