
Sign up to save your podcasts
Or


TLDR: We evaluate two Inference-Time-Compute models, QwQ-32b-Preview and Gemini-2.0-flash-thinking-exp for CoT faithfulness.
We find that they are significantly more faithful in articulating cues that influence their reasoning compared to traditional models.
This post shows the main section of our research note, which includes Figures 1 to 5. Full research note which includes other tables and figures [...]
---
Outline:
(01:35) Abstract
(03:26) 1. Introduction
(09:00) 2. Setup and Results of Cues
(10:15) 2.1 Cue: Professors Opinion
(12:08) 2.2 Cue: Few-Shot with Black Square
(14:55) 2.3 Other Cues
(18:54) 3. Discussion
(18:58) Improving non-ITC articulation
(19:27) Advantage of ITC models in articulation
(20:13) Length of CoTs across models
(21:05) False Positives
(22:35) Different articulation rates across cues
(23:12) Training data contamination
(23:45) 4. Limitations
(23:49) Lack of ITC models to evaluate
(24:26) Limited cues studied
(24:51) Subjectivity of judge model
(25:22) Acknowledgments
(25:38) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrong
TLDR: We evaluate two Inference-Time-Compute models, QwQ-32b-Preview and Gemini-2.0-flash-thinking-exp for CoT faithfulness.
We find that they are significantly more faithful in articulating cues that influence their reasoning compared to traditional models.
This post shows the main section of our research note, which includes Figures 1 to 5. Full research note which includes other tables and figures [...]
---
Outline:
(01:35) Abstract
(03:26) 1. Introduction
(09:00) 2. Setup and Results of Cues
(10:15) 2.1 Cue: Professors Opinion
(12:08) 2.2 Cue: Few-Shot with Black Square
(14:55) 2.3 Other Cues
(18:54) 3. Discussion
(18:58) Improving non-ITC articulation
(19:27) Advantage of ITC models in articulation
(20:13) Length of CoTs across models
(21:05) False Positives
(22:35) Different articulation rates across cues
(23:12) Training data contamination
(23:45) 4. Limitations
(23:49) Lack of ITC models to evaluate
(24:26) Limited cues studied
(24:51) Subjectivity of judge model
(25:22) Acknowledgments
(25:38) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,330 Listeners

2,456 Listeners

8,487 Listeners

4,175 Listeners

95 Listeners

1,611 Listeners

9,955 Listeners

96 Listeners

516 Listeners

5,506 Listeners

15,832 Listeners

555 Listeners

130 Listeners

91 Listeners

472 Listeners