
Sign up to save your podcasts
Or
Epistemic status: The important content here is the claims. To illustrate the claims, I sometimes use examples that I didn't research very deeply, where I might get some facts wrong; feel free to treat these examples as fictional allegories.
In a recent exchange on X, I promised to write a post with my thoughts on what sorts of downstream problems interpretability researchers should try to apply their work to. But first, I want to explain why I think this question is important.
In this post, I will argue that interpretability researchers should demo downstream applications of their research as a means of validating their research. To be clear about what this claim means, here are different claims that I will not defend here:
Not my claim: Interpretability researchers should demo downstream applications of their research because we terminally care about these applications; researchers should just directly work on the [...]
---
Outline:
(02:30) Two interpretability fears
(07:21) Proposed solution: downstream applications
(11:04) Aside: fair fight vs. no-holds barred vs. in the wild
(12:54) Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Epistemic status: The important content here is the claims. To illustrate the claims, I sometimes use examples that I didn't research very deeply, where I might get some facts wrong; feel free to treat these examples as fictional allegories.
In a recent exchange on X, I promised to write a post with my thoughts on what sorts of downstream problems interpretability researchers should try to apply their work to. But first, I want to explain why I think this question is important.
In this post, I will argue that interpretability researchers should demo downstream applications of their research as a means of validating their research. To be clear about what this claim means, here are different claims that I will not defend here:
Not my claim: Interpretability researchers should demo downstream applications of their research because we terminally care about these applications; researchers should just directly work on the [...]
---
Outline:
(02:30) Two interpretability fears
(07:21) Proposed solution: downstream applications
(11:04) Aside: fair fight vs. no-holds barred vs. in the wild
(12:54) Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,368 Listeners
2,397 Listeners
7,808 Listeners
4,111 Listeners
87 Listeners
1,454 Listeners
8,774 Listeners
89 Listeners
354 Listeners
5,363 Listeners
15,028 Listeners
461 Listeners
127 Listeners
65 Listeners
432 Listeners