Share Unfaithful Chain of Thought

Copy link

April 13, 2026

Unfaithful Chain of Thought

24 minutes

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of thought the genuine process, or just a plausible-sounding story told after the fact? It's a deceptively deep question with real stakes for how much we should trust model explanations.

Miles Turpin et al., "Language Models Don't Always Say What They Think: Unfaithful Explanations in

Chain-of-Thought Prompting" (NeurIPS 2023, NYU and Anthropic): https://arxiv.org/abs/2305.04388

Anthropic, "Reasoning Models Don't Always Say What They Think" (Alignment Faking research, 2025):

https://www.anthropic.com/research/reasoning-models-dont-say-think

...more

View all episodes

By Katie Malone

4.8

354354 ratings