Training Data

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs


Listen Later

LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives. 


Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go.


Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world? 


To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows.


Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital 


00:00 Introduction

01:11 Leaving Russia, discovering science

10:01 Getting into AI with Ioannis Antonoglou

15:54 Reflection AI and agents

25:41 The current state of Ai agents

29:17 AlphaGo, AlphaZero and Gemini

32:58 LLMs don’t have a ground truth reward

37:53 The importance of post-training

44:12 Task categories for agents

45:54 Attracting talent

50:52 How far away are capable agents?

56:01 Lightning round


Mentioned: 


  • The Feynman Lectures on Physics: The classic text that got Misha interested in science.
  • Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper.
  • Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper
  • Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms.
  • Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet
  • Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with
  • A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning
    ...more
    View all episodesView all episodes
    Download on the App Store

    Training DataBy Sequoia Capital

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    36 ratings


    More shows like Training Data

    View all
    This Week in Startups by Jason Calacanis

    This Week in Startups

    1,273 Listeners

    a16z Podcast by Andreessen Horowitz

    a16z Podcast

    1,033 Listeners

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

    The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

    519 Listeners

    Invest Like the Best with Patrick O'Shaughnessy by Colossus | Investing & Business Podcasts

    Invest Like the Best with Patrick O'Shaughnessy

    2,316 Listeners

    Y Combinator Startup Podcast by Y Combinator

    Y Combinator Startup Podcast

    217 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    88 Listeners

    Dwarkesh Podcast by Dwarkesh Patel

    Dwarkesh Podcast

    408 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    121 Listeners

    Unsupervised Learning by by Redpoint Ventures

    Unsupervised Learning

    39 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    75 Listeners

    Crucible Moments by Sequoia Capital

    Crucible Moments

    92 Listeners

    The Ben & Marc Show by Marc Andreessen, Ben Horowitz

    The Ben & Marc Show

    135 Listeners

    BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

    BG2Pod with Brad Gerstner and Bill Gurley

    461 Listeners

    AI + a16z by a16z

    AI + a16z

    31 Listeners

    Lightcone Podcast by Y Combinator

    Lightcone Podcast

    22 Listeners

    Uncapped with Jack Altman by Alt Capital

    Uncapped with Jack Altman

    17 Listeners