Unsupervised Learning with Jacob Effron

Ep 74: Chief Scientist of Together.AI Tri Dao On The End of Nvidia's Dominance, Why Inference Costs Fell & The Next 10X in Speed


Listen Later

Fill out this short listener survey to help us improve the show: https://forms.gle/bbcRiPTRwKoG2tJx8

 

Tri Dao, Chief Scientist at Together AI and Princeton professor who created Flash Attention and Mamba, discusses how inference optimization has driven costs down 100x since ChatGPT's launch through memory optimization, sparsity advances, and hardware-software co-design. He predicts the AI hardware landscape will shift from Nvidia's current 90% dominance to a more diversified ecosystem within 2-3 years, as specialized chips emerge for distinct workload categories: low-latency agentic systems, high-throughput batch processing, and interactive chatbots. Dao shares his surprise at AI models becoming genuinely useful for expert-level work, making him 1.5x more productive at GPU kernel optimization through tools like Claude Code and O1. The conversation explores whether current transformer architectures can reach expert-level AI performance or if approaches like mixture of experts and state space models are necessary to achieve AGI at reasonable costs. Looking ahead, Dao sees another 10x cost reduction coming from continued hardware specialization, improved kernels, and architectural advances like ultra-sparse models, while emphasizing that the biggest challenge remains generating expert-level training data for domains lacking extensive internet coverage.

 

(0:00) Intro

(1:58) Nvidia's Dominance and Competitors

(4:01) Challenges in Chip Design

(6:26) Innovations in AI Hardware

(9:21) The Role of AI in Chip Optimization

(11:38) Future of AI and Hardware Abstractions

(16:46) Inference Optimization Techniques

(33:10) Specialization in AI Inference

(35:18) Deep Work Preferences and Low Latency Workloads

(38:19) Fleet Level Optimization and Batch Inference

(39:34) Evolving AI Workloads and Open Source Tooling

(41:15) Future of AI: Agentic Workloads and Real-Time Video Generation

(44:35) Architectural Innovations and AI Expert Level

(50:10) Robotics and Multi-Resolution Processing

(52:26) Balancing Academia and Industry in AI Research

(57:37) Quickfire

 

With your co-hosts: 

@jacobeffron 

- Partner at Redpoint, Former PM Flatiron Health 

@patrickachase 

- Partner at Redpoint, Former ML Engineer LinkedIn 

@ericabrescia 

- Former COO Github, Founder Bitnami (acq’d by VMWare) 

@jordan_segall 

- Partner at Redpoint

...more
View all episodesView all episodes
Download on the App Store

Unsupervised Learning with Jacob EffronBy by Redpoint Ventures

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

49 ratings


More shows like Unsupervised Learning with Jacob Effron

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,288 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

537 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,085 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

226 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

505 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

135 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

94 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

607 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

472 Listeners

AI + a16z by a16z

AI + a16z

35 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

21 Listeners

Training Data by Sequoia Capital

Training Data

39 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

43 Listeners

Cheeky Pint by Stripe

Cheeky Pint

49 Listeners