
Sign up to save your podcasts
Or


Epistemic status: The important content here is the claims. To illustrate the claims, I sometimes use examples that I didn't research very deeply, where I might get some facts wrong; feel free to treat these examples as fictional allegories.
In a recent exchange on X, I promised to write a post with my thoughts on what sorts of downstream problems interpretability researchers should try to apply their work to. But first, I want to explain why I think this question is important.
In this post, I will argue that interpretability researchers should demo downstream applications of their research as a means of validating their research. To be clear about what this claim means, here are different claims that I will not defend here:
Not my claim: Interpretability researchers should demo downstream applications of their research because we terminally care about these applications; researchers should just directly work on the [...]
---
Outline:
(02:30) Two interpretability fears
(07:21) Proposed solution: downstream applications
(11:04) Aside: fair fight vs. no-holds barred vs. in the wild
(12:54) Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongEpistemic status: The important content here is the claims. To illustrate the claims, I sometimes use examples that I didn't research very deeply, where I might get some facts wrong; feel free to treat these examples as fictional allegories.
In a recent exchange on X, I promised to write a post with my thoughts on what sorts of downstream problems interpretability researchers should try to apply their work to. But first, I want to explain why I think this question is important.
In this post, I will argue that interpretability researchers should demo downstream applications of their research as a means of validating their research. To be clear about what this claim means, here are different claims that I will not defend here:
Not my claim: Interpretability researchers should demo downstream applications of their research because we terminally care about these applications; researchers should just directly work on the [...]
---
Outline:
(02:30) Two interpretability fears
(07:21) Proposed solution: downstream applications
(11:04) Aside: fair fight vs. no-holds barred vs. in the wild
(12:54) Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,348 Listeners

2,452 Listeners

8,547 Listeners

4,163 Listeners

95 Listeners

1,608 Listeners

10,026 Listeners

96 Listeners

518 Listeners

5,522 Listeners

15,885 Listeners

554 Listeners

132 Listeners

92 Listeners

474 Listeners