
Sign up to save your podcasts
Or
This is a blogpost version of a talk I gave earlier this year at GDM.
Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.
It's often said that mech interp is pre-paradigmatic.
I think it's worth being skeptical of this claim.
In this post I argue that:
Preamble: Kuhn, paradigms, and paradigm shifts
First, we need to be familiar with the basic definition of a paradigm:
A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]
---
Outline:
(00:58) Preamble: Kuhn, paradigms, and paradigm shifts
(03:56) Claim: Mech Interp is Not Pre-paradigmatic
(07:56) First-Wave Mech Interp (ca. 2012 - 2021)
(10:21) The Crisis in First-Wave Mech Interp
(11:21) Second-Wave Mech Interp (ca. 2022 - ??)
(14:23) Anomalies in Second-Wave Mech Interp
(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)
(18:25) Toward Third-Wave Mechanistic Interpretability
(20:28) The Basics of Parameter Decomposition
(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp
(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp
(27:27) Conclusion
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is a blogpost version of a talk I gave earlier this year at GDM.
Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.
It's often said that mech interp is pre-paradigmatic.
I think it's worth being skeptical of this claim.
In this post I argue that:
Preamble: Kuhn, paradigms, and paradigm shifts
First, we need to be familiar with the basic definition of a paradigm:
A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]
---
Outline:
(00:58) Preamble: Kuhn, paradigms, and paradigm shifts
(03:56) Claim: Mech Interp is Not Pre-paradigmatic
(07:56) First-Wave Mech Interp (ca. 2012 - 2021)
(10:21) The Crisis in First-Wave Mech Interp
(11:21) Second-Wave Mech Interp (ca. 2022 - ??)
(14:23) Anomalies in Second-Wave Mech Interp
(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)
(18:25) Toward Third-Wave Mechanistic Interpretability
(20:28) The Basics of Parameter Decomposition
(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp
(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp
(27:27) Conclusion
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,469 Listeners
2,395 Listeners
7,953 Listeners
4,142 Listeners
89 Listeners
1,472 Listeners
9,207 Listeners
88 Listeners
417 Listeners
5,461 Listeners
15,321 Listeners
482 Listeners
121 Listeners
75 Listeners
461 Listeners