September 29, 2024

“the case for CoT unfaithfulness is overstated” by nostalgebraist

Listen Later

21 minutes

[Meta note: quickly written, unpolished. Also, it's possible that there's some more convincing work on this topic that I'm unaware of – if so, let me know]

In research discussions about LLMs, I often pick up a vibe of casual, generalized skepticism about model-generated CoT (chain-of-thought) explanations.

CoTs (people say) are not trustworthy in general. They don't always reflect what the model is "actually" thinking or how it has "actually" solved a given problem.

This claim is true as far as it goes. But people sometimes act like it goes much further than (IMO) it really does.

Sometimes it seems to license an attitude of "oh, it's no use reading what the model says in the CoT, you're a chump if you trust that stuff." Or, more insidiously, a failure to even ask the question "what, if anything, can we learn about the model's reasoning process by reading the [...]

The original text contained 1 footnote which was omitted from this narration.

---

First published:

September 29th, 2024

Source:

https://www.lesswrong.com/posts/HQyWGE2BummDCc2Cx/the-case-for-cot-unfaithfulness-is-overstated

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

September 29, 2024

“the case for CoT unfaithfulness is overstated” by nostalgebraist

Listen Later

21 minutes

[Meta note: quickly written, unpolished. Also, it's possible that there's some more convincing work on this topic that I'm unaware of – if so, let me know]

In research discussions about LLMs, I often pick up a vibe of casual, generalized skepticism about model-generated CoT (chain-of-thought) explanations.

CoTs (people say) are not trustworthy in general. They don't always reflect what the model is "actually" thinking or how it has "actually" solved a given problem.

This claim is true as far as it goes. But people sometimes act like it goes much further than (IMO) it really does.

Sometimes it seems to license an attitude of "oh, it's no use reading what the model says in the CoT, you're a chump if you trust that stuff." Or, more insidiously, a failure to even ask the question "what, if anything, can we learn about the model's reasoning process by reading the [...]

The original text contained 1 footnote which was omitted from this narration.

---

First published:

September 29th, 2024

Source:

https://www.lesswrong.com/posts/HQyWGE2BummDCc2Cx/the-case-for-cot-unfaithfulness-is-overstated

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

113,075 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,268 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

560 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,492 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners