December 05, 2024

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

1 hour 8 minutes

Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:

* Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.

* Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)

* Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.

* Address some of the critiques like “RL doesn’t work yet.”

It was a fun one. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.

Timeline of RL and what was happening at the time

In the last decade of deep RL, there have been a few phases.

* Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.

* Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.

* Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.

* Era 4: RLHF & widening success — RL’s new life post ChatGPT.

Covering these is the following events. This is incomplete, but enough to inspire a conversation.

Early era: TD Gammon, REINFORCE, Etc

2013: Deep Q Learning (Atari)

2014: Google acquires DeepMind

2016: AlphaGo defeats Lee Sedol

2017: PPO paper, AlphaZero (no human data)

2018: OpenAI Five, GPT 2

2019: AlphaStar, robotic sim2real with RL early papers (see blog post)

2020: MuZero

2021: Decision Transformer

2022: ChatGPT, sim2real continues.

2023: Scaling laws for RL (blog post), doubt of RL

2024: o1, post-training, RL’s bloom

Interconnects is a reader-supported publication. Consider becoming a subscriber.

Chapters

* [00:00:00] Introduction

* [00:02:14] Reinforcement Learning Fundamentals

* [00:09:03] The Bitter Lesson

* [00:12:07] Reward Modeling and Its Challenges in RL

* [00:16:03] Historical Milestones in Deep RL

* [00:21:18] OpenAI Five and Challenges in Complex RL Environments

* [00:25:24] Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF

* [00:30:29] OpenAI's O1 and Exploration in Language Models

* [00:40:00] Tülu 3 and Challenges in RL Training for Language Models

* [00:46:48] Comparing Different AI Assistants

* [00:49:44] Management in AI Research

* [00:55:30] Building Effective AI Teams

* [01:01:55] The Need for Personal Branding

We mention

* O1 (OpenAI model)

* Rich Sutton

* University of Alberta

* London School of Economics

* IBM’s Deep Blue

* Alberta Machine Intelligence Institute (AMII)

* John Schulman

* Claude (Anthropic's AI assistant)

* Logan Kilpatrick

* Bard (Google's AI assistant)

* DeepSeek R1 Lite

* Scale AI

* OLMo (AI2's language model)

* Golden Gate Claude

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

...more

View all episodes

By Nathan Lambert

4.1

99 ratings

December 05, 2024

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

1 hour 8 minutes

* Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.

* Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.

* Address some of the critiques like “RL doesn’t work yet.”

It was a fun one. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.

Timeline of RL and what was happening at the time

In the last decade of deep RL, there have been a few phases.

* Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.

* Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.

* Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.

* Era 4: RLHF & widening success — RL’s new life post ChatGPT.

Covering these is the following events. This is incomplete, but enough to inspire a conversation.

Early era: TD Gammon, REINFORCE, Etc

2013: Deep Q Learning (Atari)

2014: Google acquires DeepMind

2016: AlphaGo defeats Lee Sedol

2017: PPO paper, AlphaZero (no human data)

2018: OpenAI Five, GPT 2

2019: AlphaStar, robotic sim2real with RL early papers (see blog post)

2020: MuZero

2021: Decision Transformer

2022: ChatGPT, sim2real continues.

2023: Scaling laws for RL (blog post), doubt of RL

2024: o1, post-training, RL’s bloom

Interconnects is a reader-supported publication. Consider becoming a subscriber.

Chapters

* [00:00:00] Introduction

* [00:02:14] Reinforcement Learning Fundamentals

* [00:09:03] The Bitter Lesson

* [00:12:07] Reward Modeling and Its Challenges in RL

* [00:16:03] Historical Milestones in Deep RL

* [00:21:18] OpenAI Five and Challenges in Complex RL Environments

* [00:25:24] Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF

* [00:30:29] OpenAI's O1 and Exploration in Language Models

* [00:40:00] Tülu 3 and Challenges in RL Training for Language Models

* [00:46:48] Comparing Different AI Assistants

* [00:49:44] Management in AI Research

* [00:55:30] Building Effective AI Teams

* [01:01:55] The Need for Personal Branding

We mention

* O1 (OpenAI model)

* Rich Sutton

* University of Alberta

* London School of Economics

* IBM’s Deep Blue

* Alberta Machine Intelligence Institute (AMII)

* John Schulman

* Claude (Anthropic's AI assistant)

* Logan Kilpatrick

* Bard (Google's AI assistant)

* DeepSeek R1 Lite

* Scale AI

* OLMo (AI2's language model)

* Golden Gate Claude

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe

...more

More shows like Interconnects

View all

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

537 Listeners

The a16z Show

1,084 Listeners

ChinaTalk

289 Listeners

Practical AI

210 Listeners

Google DeepMind: The Podcast

200 Listeners

Last Week in AI

305 Listeners

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast

502 Listeners

No Priors: Artificial Intelligence | Technology | Startups

133 Listeners

Latent Space: The AI Engineer Podcast

93 Listeners

This Day in AI Podcast

225 Listeners

"Econ 102" with Noah Smith and Erik Torenberg

152 Listeners

BG2Pod with Brad Gerstner and Bill Gurley

467 Listeners

AI + a16z

35 Listeners

Training Data

39 Listeners

Share Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

Sign up to save your podcasts

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

More shows like Interconnects

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

The a16z Show

ChinaTalk

Practical AI

Google DeepMind: The Podcast

Last Week in AI

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

This Day in AI Podcast

"Econ 102" with Noah Smith and Erik Torenberg

BG2Pod with Brad Gerstner and Bill Gurley

AI + a16z

Training Data