Linear Digressions

Unfaithful Chain of Thought


Listen Later

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations.
Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in
Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388
Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025):
https://www.anthropic.com/research/reasoning-models-dont-say-think
...more
View all episodesView all episodes
Download on the App Store

Linear DigressionsBy Katie Malone

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

354 ratings