The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)


Listen Later

Everyone's talking about the AI datacenter boom right now. Billion dollar deals here, hundred billion dollar deals there. Well, why do data centers matter? It turns out, AI inference (actually calling the AI and running it) is the hidden bottleneck slowing down every AI application you use (and new stuff yet to be released).

In this episode, Kwasi Ankomah from SambaNova Systems explains why running AI models efficiently matters more than you think, how their revolutionary chip architecture delivers 700+ tokens per second, and why AI agents are about to make this problem 10x worse.

💡 This episode is sponsored by Gladia's Solaria - the speech-to-text API built for real-world voice AI. With sub-270ms latency, 100+ languages supported, and 94% accuracy even in noisy environments, it's the backbone powering voice agents that actually work. Learn more at gladia.io/solaria

🔗 Key Links:

• SambaNova Cloud: https://cloud.sambanova.ai

• Check out Solaria speech to text API: https://www.gladia.io/solaria

• Subscribe to The Neuron newsletter: https://theneuron.ai

🎯 What You'll Learn:

• Why inference speed matters more than model size

• How SambaNova runs massive models on 90% less power

• Why AI agents use 10-20x more tokens

• The best open source models right now

• What to watch for in AI infrastructure

➤ CHAPTERS

Timecode - Chapter Title

0:00 - Intro

2:14 - What is AI Inference?

3:19 - Why Inference is the Real Challenge

9:18 - A message from our sponsor, Gladia Solaria

10:16 - The 95% ROI Problem Discussion

13:47 - SambaNova's Revolutionary Chip Architecture

15:19 - Running DeepSeek's 670B Parameter Models

18:11 - Developer Experience & Platform

21:26 - AI Agents and the Token Explosion

24:33 - Model Swapping and Cost Optimization

31:30 - Energy Efficiency 10kW vs 100kW

36:13 - Future of AI Models Bigger vs Smaller

39:24 - Best Open Source Models Right Now

46:01 - AI Infrastructure Next 12 Months

47:09 - Agents as Infrastructure

50:28 - Human-in-the-Loop and Trust

52:55 - Closing and Resources

Article Written by: Grant Harvey

Hosted by: Corey Noles and Grant Harvey

Guest: Kwasi Ankomah

Published by: Manique Santos

Edited by: Adrian Vallinan

...more
View all episodesView all episodes
Download on the App Store

The Neuron: AI ExplainedBy The Neuron

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

60 ratings


More shows like The Neuron: AI Explained

View all
a16z Podcast by Andreessen Horowitz

a16z Podcast

1,081 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

435 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

339 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Hard Fork by The New York Times

Hard Fork

5,475 Listeners

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

150 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

209 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

560 Listeners

AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

AI For Humans: Making Artificial Intelligence Fun & Practical

267 Listeners

Everyday AI Podcast – An AI and ChatGPT Podcast by Everyday AI

Everyday AI Podcast – An AI and ChatGPT Podcast

104 Listeners

AI Hustle: Make Money from AI and ChatGPT, Midjourney, NVIDIA, Anthropic, OpenAI by Jaeden Schafer and Jamie McCauley

AI Hustle: Make Money from AI and ChatGPT, Midjourney, NVIDIA, Anthropic, OpenAI

70 Listeners

The Next Wave - AI and The Future of Technology by Mindstream (Hubspot Media)

The Next Wave - AI and The Future of Technology

59 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Gemini, OpenAI, Anthropic by Jaeden Schafer and Conor Grennan

AI Applied: Covering AI News, Interviews and Tools - ChatGPT, Midjourney, Gemini, OpenAI, Anthropic

134 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

52 Listeners