
Sign up to save your podcasts
Or


Emil Ryd
Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion.
Thanks to Claude Opus 4.5 for help with designing and implementing the experiments.
Introduction
We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”).
We find that:
---
Outline:
(00:30) Introduction
(01:45) Background
(03:26) Methods
(04:29) Datasets
(04:32) Increased Sort
(05:04) Subtracted Table Lookup
(06:04) Chess
(06:30) Hot Square Capture
(07:38) Training
(08:16) Evaluation
(09:35) Results
(09:38) Models are generally unable to verbalize their reasoning on tasks
(12:31) Training models to solve a task in natural language does not guarantee legible reasoning
(15:17) Discussion
(15:20) Limitations
(17:04) Training models to verbalize their reasoning
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongEmil Ryd
Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion.
Thanks to Claude Opus 4.5 for help with designing and implementing the experiments.
Introduction
We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”).
We find that:
---
Outline:
(00:30) Introduction
(01:45) Background
(03:26) Methods
(04:29) Datasets
(04:32) Increased Sort
(05:04) Subtracted Table Lookup
(06:04) Chess
(06:30) Hot Square Capture
(07:38) Training
(08:16) Evaluation
(09:35) Results
(09:38) Models are generally unable to verbalize their reasoning on tasks
(12:31) Training models to solve a task in natural language does not guarantee legible reasoning
(15:17) Discussion
(15:20) Limitations
(17:04) Training models to verbalize their reasoning
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

112,326 Listeners

130 Listeners

7,242 Listeners

559 Listeners

16,321 Listeners

4 Listeners

14 Listeners

2 Listeners