LessWrong (30+ Karma)

“LLMs struggle to verbalize their internal reasoning” by Emil Ryd


Listen Later

Emil Ryd

Thanks to Adam Karvonen, Arjun Khandelwal, Arun Jose, Fabien Roger, James Chua, Nic Kruus, & Sukrit Sumant for helpful feedback and discussion.

Thanks to Claude Opus 4.5 for help with designing and implementing the experiments.

Introduction

We study to what extent LLMs can verbalize their internal reasoning. To do this, we train LLMs to solve various games and tasks (sorting lists, two-hop lookup, a custom grid-world game, and chess) in a single forward pass. After training, we evaluate them by prompting them with a suite of questions asking them to explain their moves and the reasoning behind it, e.g. “Explain why you chose your move.”, “Explain the rules of the game”).

We find that:

  1. Models trained to solve tasks in a single forward pass are not able to verbalize a correct reason for their actions[1]. Instead, they hallucinate incorrect reasoning.
  2. When trained to solve a very simple sorting task (sorting lists in increasing order) the models are able to verbalize the sorting rule, although unreliably. Furthermore, we believe this might be mostly due to the sorting rule being the most likely.
  3. When trained to solve a previously unseen task (grid-world game) with reasoning via RL [...]

---

Outline:

(00:30) Introduction

(01:45) Background

(03:26) Methods

(04:29) Datasets

(04:32) Increased Sort

(05:04) Subtracted Table Lookup

(06:04) Chess

(06:30) Hot Square Capture

(07:38) Training

(08:16) Evaluation

(09:35) Results

(09:38) Models are generally unable to verbalize their reasoning on tasks

(12:31) Training models to solve a task in natural language does not guarantee legible reasoning

(15:17) Discussion

(15:20) Limitations

(17:04) Training models to verbalize their reasoning

The original text contained 3 footnotes which were omitted from this narration.

---

First published:

February 14th, 2026

Source:

https://www.lesswrong.com/posts/dFRFxhaJkf9dE6Jfy/llms-struggle-to-verbalize-their-internal-reasoning

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,326 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,242 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,321 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners