December 19, 2025

“Scalable End-to-End Interpretability” by jsteinhardt

Listen Later

5 minutes

This is partly a linkpost for Predictive Concept Decoders, and partly a response to Neel Nanda's Pragmatic Vision for AI Interpretability and Leo Gao's Ambitious Vision for Interpretability.

There is currently somewhat of a debate in the interpretability community between pragmatic interpretability---grounding problems in empirically measurable safety tasks---and ambitious interpretability----obtaining a full bottom-up understanding of neural networks.

In my mind, these both get at something important but also both miss something. What they each get right:

Pragmatic interpretability identifies the need to ground in actual behaviors and data to make progress, and is closer to "going for the throat" in terms of solving specific problems like unfaithfulness.
Ambitious interpretability correctly notes that much of what goes on in neural networks is highly compositional, and that efficient explanations will need to leverage this compositional structure. It also more directly addresses gaps between internal process and outputs on a philosophical level.

On the other hand, pragmatic interpretability tends to underweight compositionality, while ambitious interpretability feels very indirect and potentially impossible.

I think a better approach is what I'll call scalable end-to-end interpretability. In this approach, we train end-to-end AI assistants to do interpretability for us, in such a way that [...]

---

First published:

December 18th, 2025

Source:

https://www.lesswrong.com/posts/qkhwh4AdG7kXgELCD/scalable-end-to-end-interpretability

---

Narrated by TYPE III AUDIO.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

LessWrong (30+ Karma)

By LessWrong

December 19, 2025

“Scalable End-to-End Interpretability” by jsteinhardt

Listen Later

5 minutes

This is partly a linkpost for Predictive Concept Decoders, and partly a response to Neel Nanda's Pragmatic Vision for AI Interpretability and Leo Gao's Ambitious Vision for Interpretability.

There is currently somewhat of a debate in the interpretability community between pragmatic interpretability---grounding problems in empirically measurable safety tasks---and ambitious interpretability----obtaining a full bottom-up understanding of neural networks.

In my mind, these both get at something important but also both miss something. What they each get right:

Pragmatic interpretability identifies the need to ground in actual behaviors and data to make progress, and is closer to "going for the throat" in terms of solving specific problems like unfaithfulness.
Ambitious interpretability correctly notes that much of what goes on in neural networks is highly compositional, and that efficient explanations will need to leverage this compositional structure. It also more directly addresses gaps between internal process and outputs on a philosophical level.

On the other hand, pragmatic interpretability tends to underweight compositionality, while ambitious interpretability feels very indirect and potentially impossible.

I think a better approach is what I'll call scalable end-to-end interpretability. In this approach, we train end-to-end AI assistants to do interpretability for us, in such a way that [...]

---

First published:

December 18th, 2025

Source:

https://www.lesswrong.com/posts/qkhwh4AdG7kXgELCD/scalable-end-to-end-interpretability

---

Narrated by TYPE III AUDIO.

...more

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,284 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,247 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

560 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,302 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners