Training Data

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better


Listen Later

Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks.

Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better. 

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital 

Mentioned in this episode:

  • Learning to Reason with LLMs: Technical report accompanying the launch of OpenAI o1.
  • Generator verifier gap: Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute.
  • Agent57: Outperforming the human Atari benchmark, 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.”
  • Move 37: Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move.
  • IOI competition: OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal.
  • System 1, System 2: The thesis if Danial Khaneman’s pivotal book of behavioral economics, Thinking, Fast and Slow, that positied two distinct modes of thought, with System 1 being fast and instinctive and System 2 being slow and rational.
  • AlphaZero: The predecessor to AlphaGo which learned a variety of games completely from scratch through self-play. Interestingly, self-play doesn’t seem to have a role in o1.
  • Solving Rubik’s Cube with a robot hand: Early OpenAI robotics paper that Ilge Akkaya worked on.
  • The Last Question: Science fiction story by Isaac Asimov with interesting parallels to scaling inference-time compute.
  • Strawberry: Why?
  • O1-mini: A smaller, more efficient version of 1 for applications that require reasoning without broad world knowledge.


    00:00 - Introduction

    01:33 - Conviction in o1

    04:24 - How o1 works

    05:04 - What is reasoning?

    07:02 - Lessons from gameplay

    09:14 - Generation vs verification

    10:31 - What is surprising about o1 so far

    11:37 - The trough of disillusionment

    14:03 - Applying deep RL

    14:45 - o1’s AlphaGo moment?

    17:38 - A-ha moments

    21:10 - Why is o1 good at STEM?

    24:10 - Capabilities vs usefulness

    25:29 - Defining AGI

    26:13 - The importance of reasoning

    28:39 - Chain of thought

    30:41 - Implication of inference-time scaling laws

    35:10 - Bottlenecks to scaling test-time compute

    38:46 - Biggest misunderstanding about o1?

    41:13 - o1-mini

    42:15 - How should founders think about o1?

    ...more
    View all episodesView all episodes
    Download on the App Store

    Training DataBy Sequoia Capital

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    36 ratings


    More shows like Training Data

    View all
    This Week in Startups by Jason Calacanis

    This Week in Startups

    1,283 Listeners

    a16z Podcast by Andreessen Horowitz

    a16z Podcast

    1,080 Listeners

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

    527 Listeners

    Y Combinator Startup Podcast by Y Combinator

    Y Combinator Startup Podcast

    221 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    206 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    88 Listeners

    Grit by Kleiner Perkins

    Grit

    189 Listeners

    Dwarkesh Podcast by Dwarkesh Patel

    Dwarkesh Podcast

    453 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    130 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    96 Listeners

    Crucible Moments by Sequoia Capital

    Crucible Moments

    91 Listeners

    BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

    BG2Pod with Brad Gerstner and Bill Gurley

    482 Listeners

    AI + a16z by a16z

    AI + a16z

    31 Listeners

    Lightcone Podcast by Y Combinator

    Lightcone Podcast

    17 Listeners

    Uncapped with Jack Altman by Alt Capital

    Uncapped with Jack Altman

    41 Listeners

    Cheeky Pint by Stripe

    Cheeky Pint

    16 Listeners