Latent Space: The AI Engineer Podcast

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony


Listen Later

OpenAI recently made waves by being the first big model lab to commit to a hyperscale AMD cluster (together with their own Titan XPUs), giving AMD the first biglab silicon win outside of Nvidia/Google. Returning guest Quentin Anthony, Head of Model Training at Zyphra and advisor at EleutherAI, has recently done this same transition. In part 1 of this pod, Quentin describes his journey from working on Oak Ridge National Lab's Frontier supercomputer to leading Zyphra's ambitious move to AMD MI300X GPUs, where they're achieving performance that beats NVIDIA H100s on certain workloads while dramatically reducing costs. The discussion dives deep into the technical challenges of kernel development, with Quentin explaining why he often bypasses high-level frameworks like Triton to write directly in ROCm or even GPU assembly when necessary. He reveals how Zyphra's hybrid transformer-Mamba models like Zamba 2 can match Llama 3 8B performance at 7B parameters, optimized specifically for edge deployment across a spectrum from 1.2B models for phones to 7B for desktops.

In Part 2, Quentin then candidly discusses his experience in the controversial METR software engineering productivity study, which found that developers felt 20% faster while using AI coding tools, but were in fact 20% slower. Quentin was one of the few developers who showed measurable speedup from AI tools. He shares practical insights on avoiding the "slot machine effect" of endlessly prompting models, the importance of context rot awareness, and why he prefers direct API access over tools like Cursor to maintain complete control over model context. The conversation also covers the state of open source AI research, with Quentin arguing that siloed, focused teams with guaranteed funding produce better results than grand collaborative efforts. He explains why kernel datasets alone won't solve the GPU programming problem, the challenges of evaluating kernel quality, and why companies should invest more in ecosystem development rather than traditional marketing.

https://www.linkedin.com/in/quentin-anthony/

https://www.zyphra.com/post/zamba2-7b

Key Topics: • AMD MI300X advantages: 192GB VRAM, superior memory bandwidth • Writing kernels from PTX/AMD GCN assembly up through CUDA/ROCm • Hybrid attention-Mamba architectures and optimal sparsity ratios • The Menlo productivity study: achieving positive AI speedup • Context rot and why shorter conversations beat long threads • Why physicists make great ML engineers ("embryonic stem cells") • Edge deployment strategies from phones to local clusters • The future of on-device vs cloud inference routing • EleutherAI's focus on interpretability with fully open pipelines • Building velocity-focused teams over position-based hiring

...more
View all episodesView all episodes
Download on the App Store

Latent Space: The AI Engineer PodcastBy swyx + Alessio

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

86 ratings


More shows like Latent Space: The AI Engineer Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

537 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

290 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,094 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

340 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

236 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

196 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

70 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

209 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

591 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

522 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

22 Listeners

Training Data by Sequoia Capital

Training Data

39 Listeners