Unsupervised Learning

Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed


Listen Later

Fill out this short listener survey to help us improve the show: https://forms.gle/bbcRiPTRwKoG2tJx8

 

Tri Dao, Chief Scientist at Together AI and Princeton professor who created Flash Attention and Mamba, discusses how inference optimization has driven costs down 100x since ChatGPT's launch through memory optimization, sparsity advances, and hardware-software co-design. He predicts the AI hardware landscape will shift from Nvidia's current 90% dominance to a more diversified ecosystem within 2-3 years, as specialized chips emerge for distinct workload categories: low-latency agentic systems, high-throughput batch processing, and interactive chatbots. Dao shares his surprise at AI models becoming genuinely useful for expert-level work, making him 1.5x more productive at GPU kernel optimization through tools like Claude Code and O1. The conversation explores whether current transformer architectures can reach expert-level AI performance or if approaches like mixture of experts and state space models are necessary to achieve AGI at reasonable costs. Looking ahead, Dao sees another 10x cost reduction coming from continued hardware specialization, improved kernels, and architectural advances like ultra-sparse models, while emphasizing that the biggest challenge remains generating expert-level training data for domains lacking extensive internet coverage.

 

(0:00) Intro

(1:58) Nvidia's Dominance and Competitors

(4:01) Challenges in Chip Design

(6:26) Innovations in AI Hardware

(9:21) The Role of AI in Chip Optimization

(11:38) Future of AI and Hardware Abstractions

(16:46) Inference Optimization Techniques

(33:10) Specialization in AI Inference

(35:18) Deep Work Preferences and Low Latency Workloads

(38:19) Fleet Level Optimization and Batch Inference

(39:34) Evolving AI Workloads and Open Source Tooling

(41:15) Future of AI: Agentic Workloads and Real-Time Video Generation

(44:35) Architectural Innovations and AI Expert Level

(50:10) Robotics and Multi-Resolution Processing

(52:26) Balancing Academia and Industry in AI Research

(57:37) Quickfire

 

With your co-hosts: 

@jacobeffron 

- Partner at Redpoint, Former PM Flatiron Health 

@patrickachase 

- Partner at Redpoint, Former ML Engineer LinkedIn 

@ericabrescia 

- Former COO Github, Founder Bitnami (acq’d by VMWare) 

@jordan_segall 

- Partner at Redpoint

...more
View all episodesView all episodes
Download on the App Store

Unsupervised LearningBy by Redpoint Ventures

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

49 ratings


More shows like Unsupervised Learning

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,283 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,090 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

529 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

222 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

87 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

465 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

134 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

95 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

556 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

506 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

19 Listeners

Training Data by Sequoia Capital

Training Data

41 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

40 Listeners

Cheeky Pint by Stripe

Cheeky Pint

18 Listeners