
Sign up to save your podcasts
Or
Summary
Motivation
There are many examples of unfaithful LLM reasoning - where the answer doesn't follow from the reasoning, but rather the reasoning is just a rationalization for the answer. E.g. Turpin et al. 2023 show LLMs rationalizing for sycophantic and stereotypical answers. However, these examples are cases of rather simple hidden reasoning. What would be most worrying, is LLMs doing complex [...]
---
Outline:
(00:05) Summary
(00:49) Motivation
(02:15) Toy task for hidden serial reasoning
(03:37) Experiments
(06:15) Bonus experiment 1 - Is non-linearity required for hidden serial reasoning?
(06:53) Bonus experiment 2 - Do more layers enable longer hidden reasoning in transformers?
(07:39) Caveats
The original text contained 3 footnotes which were omitted from this narration.
The original text contained 6 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Summary
Motivation
There are many examples of unfaithful LLM reasoning - where the answer doesn't follow from the reasoning, but rather the reasoning is just a rationalization for the answer. E.g. Turpin et al. 2023 show LLMs rationalizing for sycophantic and stereotypical answers. However, these examples are cases of rather simple hidden reasoning. What would be most worrying, is LLMs doing complex [...]
---
Outline:
(00:05) Summary
(00:49) Motivation
(02:15) Toy task for hidden serial reasoning
(03:37) Experiments
(06:15) Bonus experiment 1 - Is non-linearity required for hidden serial reasoning?
(06:53) Bonus experiment 2 - Do more layers enable longer hidden reasoning in transformers?
(07:39) Caveats
The original text contained 3 footnotes which were omitted from this narration.
The original text contained 6 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,350 Listeners
2,392 Listeners
7,955 Listeners
4,128 Listeners
87 Listeners
1,445 Listeners
8,909 Listeners
88 Listeners
372 Listeners
5,426 Listeners
15,326 Listeners
466 Listeners
122 Listeners
76 Listeners
450 Listeners