LessWrong (30+ Karma)

“Inference-Time-Compute: More Faithful? A Research Note” by James Chua, Owain_Evans


Listen Later

Figure 1: Left: Example of models either succeeding or failing to articulate a cue that influences their answer. We edit an MMLU question by prepending a Stanford professor's opinion. For examples like this where the cue changes the model answer, we measure how often models articulate the cue in their CoT. (Here we show only options A and B, rather than all four.) Right: Inference-Time-Compute models articulate the cue more often. The ITC version of Qwen refers to QwQ-32b-Preview, and non-ITC refers to Qwen-2.5-72B-Instruct. For Gemini, we use gemini-2.0-flash-thinking-exp and gemini-2.0-flash-exp respectively.

TLDR: We evaluate two Inference-Time-Compute models, QwQ-32b-Preview and Gemini-2.0-flash-thinking-exp for CoT faithfulness.
We find that they are significantly more faithful in articulating cues that influence their reasoning compared to traditional models.

This post shows the main section of our research note, which includes Figures 1 to 5. Full research note which includes other tables and figures [...]


---

Outline:

(01:35) Abstract

(03:26) 1. Introduction

(09:00) 2. Setup and Results of Cues

(10:15) 2.1 Cue: Professors Opinion

(12:08) 2.2 Cue: Few-Shot with Black Square

(14:55) 2.3 Other Cues

(18:54) 3. Discussion

(18:58) Improving non-ITC articulation

(19:27) Advantage of ITC models in articulation

(20:13) Length of CoTs across models

(21:05) False Positives

(22:35) Different articulation rates across cues

(23:12) Training data contamination

(23:45) 4. Limitations

(23:49) Lack of ITC models to evaluate

(24:26) Limited cues studied

(24:51) Subjectivity of judge model

(25:22) Acknowledgments

(25:38) Links

---

First published:

January 15th, 2025

Source:

https://www.lesswrong.com/posts/C8HAa2mf5kcBrpjkX/inference-time-compute-more-faithful-a-research-note

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,193 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,227 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,216 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners