Theo Jaffee Podcast

#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism


Listen Later

Quintin Pope is a machine learning researcher focusing on natural language modeling and AI alignment. Among alignment researchers, Quintin stands out for his optimism. He believes that AI alignment is far more tractable than it seems, and that we appear to be on a good path to making the future great. On LessWrong, he's written one of the most popular posts of the last year, “My Objections To ‘We're All Gonna Die with Eliezer Yudkowsky’”, as well as many other highly upvoted posts on various alignment papers, and on his own theory of alignment, shard theory.

  • Quintin’s Twitter: https://twitter.com/QuintinPope5

  • Quintin’s LessWrong profile: https://www.lesswrong.com/users/quintin-pope

  • My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky

  • The Shard Theory Sequence: https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX

  • Quintin’s Alignment Papers Roundup: https://www.lesswrong.com/s/5omSW4wNKbEvYsyje

  • Evolution provides no evidence for the sharp left turn: https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn

  • Deep Differentiable Logic Gate Networks: https://arxiv.org/abs/2210.08277

  • The Hydra Effect: Emergent Self-repair in Language Model Computations: https://arxiv.org/abs/2307.15771

  • Deep learning generalizes because the parameter-function map is biased towards simple functions: https://arxiv.org/abs/1805.08522

  • Bridging RL Theory and Practice with the Effective Horizon: https://arxiv.org/abs/2304.09853

  • PODCAST LINKS:

    • Video Transcript: https://www.theojaffee.com/p/5-quintin-pope

    • Spotify: https://open.spotify.com/show/1IJRtB8FP4Cnq8lWuuCdvW?si=eba62a72e6234efb

    • Apple Podcasts: https://podcasts.apple.com/us/podcast/theo-jaffee-podcast/id1699912677

    • RSS: https://api.substack.com/feed/podcast/989123/s/75569/private/129f6344-c459-4581-a9da-dc331677c2f6.rss

    • Playlist of all episodes: https://www.youtube.com/playlist?list=PLVN8-zhbMh9YnOGVRT9m0xzqTNGD_sujj

    • My Twitter: https://x.com/theojaffee

    • My Substack: https://www.theojaffee.com

    • CHAPTERS:

      Introduction (0:00)

      What Is AGI? (1:03)

      What Can AGI Do? (12:49)

      Orthogonality (23:14)

      Mind Space (42:50)

      Quintin’s Background and Optimism (55:06)

      Mesa-Optimization and Reward Hacking (1:02:48)

      Deceptive Alignment (1:11:52)

      Shard Theory (1:24:10)

      What Is Alignment? (1:30:05)

      Misalignment and Evolution (1:37:21)

      Mesa-Optimization and Reward Hacking, Part 2 (1:46:56)

      RL Agents (1:55:02)

      Monitoring AIs (2:09:29)

      Mechanistic Interpretability (2:14:00)

      AI Disempowering Humanity (2:28:13)

      ...more
      View all episodesView all episodes
      Download on the App Store

      Theo Jaffee PodcastBy Theo Jaffee


      More shows like Theo Jaffee Podcast

      View all
      Conversations with Tyler by Mercatus Center at George Mason University

      Conversations with Tyler

      2,424 Listeners

      Future of Life Institute Podcast by Future of Life Institute

      Future of Life Institute Podcast

      107 Listeners

      Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

      Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

      4,145 Listeners

      ManifoldOne by Steve Hsu

      ManifoldOne

      92 Listeners

      Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

      Machine Learning Street Talk (MLST)

      89 Listeners

      Dwarkesh Podcast by Dwarkesh Patel

      Dwarkesh Podcast

      488 Listeners

      Theories of Everything with Curt Jaimungal by Theories of Everything

      Theories of Everything with Curt Jaimungal

      33 Listeners

      Razib Khan's Unsupervised Learning by Razib Khan

      Razib Khan's Unsupervised Learning

      209 Listeners

      Joe Lonsdale: American Optimist by Joe Lonsdale

      Joe Lonsdale: American Optimist

      161 Listeners

      "Econ 102" with Noah Smith and Erik Torenberg by Turpentine

      "Econ 102" with Noah Smith and Erik Torenberg

      151 Listeners

      "Live Players" with Samo Burja and Erik Torenberg by Turpentine

      "Live Players" with Samo Burja and Erik Torenberg

      39 Listeners

      AI and I by Dan Shipper

      AI and I

      36 Listeners

      Based Camp | Simone & Malcolm Collins by Based Camp | Simone & Malcolm Collins

      Based Camp | Simone & Malcolm Collins

      130 Listeners

      Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

      Complex Systems with Patrick McKenzie (patio11)

      133 Listeners

      Doom Debates by Liron Shapira

      Doom Debates

      10 Listeners