Training Data

Microsoft CTO Kevin Scott on How Far Scaling Laws Will Extend


Listen Later

The current LLM era is the result of scaling the size of models in successive waves (and the compute to train them). It is also the result of better-than-Moore’s-Law price vs performance ratios in each new generation of Nvidia GPUs. The largest platform companies are continuing to invest in scaling as the prime driver of AI innovation.


Are they right, or will marginal returns level off soon, leaving hyperscalers with too much hardware and too few customer use cases? To find out, we talk to Microsoft CTO Kevin Scott who has led their AI strategy for the past seven years. Scott describes himself as a “short-term pessimist, long-term optimist” and he sees the scaling trend as durable for the industry and critical for the establishment of Microsoft’s AI platform.


Scott believes there will be a shift across the compute ecosystem from training to inference as the frontier models continue to improve, serving wider and more reliable use cases. He also discusses the coming business models for training data, and even what ad units might look like for autonomous agents.


Hosted by: Pat Grady and Bill Coughran, Sequoia Capital


Mentioned:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the 2018 Google paper that convinced Kevin that Microsoft wasn’t moving fast enough on AI. 

Dennard scaling: The scaling law that describes the proportional relationship between transistor size and power use; has not held since 2012 and is often confused with Moore’s Law.

Textbooks Are All You Need: Microsoft paper that introduces a new large language model for code, phi-1, that achieves smaller size by using higher quality “textbook” data.

GPQA and MMLU: Benchmarks for reasoning

Copilot: Microsoft product line of GPT consumer assistants from general productivity to design, vacation planning, cooking and fitness.

Devin: Autonomous AI code agent from Cognition Labs that Microsoft recently announced a partnership with.

Ray Solomonoff: Participant in the 1956 Dartmouth Summer Research Project on Artificial Intelligence that named the field; Kevin admires his prescience about the importance of probabilistic methods decades before anyone else.


00:00 - Introduction

01:20 - Kevin’s backstory

06:56 - The role of PhDs in AI engineering

09:56 - Microsoft’s AI strategy

12:40 - Highlights and lowlights

16:28 - Accelerating investments

18:38 - The OpenAI partnership

22:46 - Soon inference will dwarf training

27:56 - Will the demand/supply balance change?

30:51 - Business models for data

36:54 - The value function

39:58 - Copilots

44:47 - The 98/2 rule

49:34 - Solving zero-sum games

57:13 - Lightning round

...more
View all episodesView all episodes
Download on the App Store

Training DataBy Sequoia Capital

  • 4.2
  • 4.2
  • 4.2
  • 4.2
  • 4.2

4.2

36 ratings


More shows like Training Data

View all
This Week in Startups by Jason Calacanis

This Week in Startups

1,273 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,033 Listeners

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

519 Listeners

Invest Like the Best with Patrick O'Shaughnessy by Colossus | Investing & Business Podcasts

Invest Like the Best with Patrick O'Shaughnessy

2,316 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

217 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

88 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

408 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

121 Listeners

Unsupervised Learning by by Redpoint Ventures

Unsupervised Learning

39 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

Crucible Moments by Sequoia Capital

Crucible Moments

92 Listeners

The Ben & Marc Show by Marc Andreessen, Ben Horowitz

The Ben & Marc Show

135 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

461 Listeners

AI + a16z by a16z

AI + a16z

31 Listeners

Lightcone Podcast by Y Combinator

Lightcone Podcast

22 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

17 Listeners