
Sign up to save your podcasts
Or


Mechanistic interpretability needs its own shoe leather era. Reproducing the labeling process will matter more than reproducing the Github.
Crossposted from Communication & Intelligence substack
When we try to understand large language models, we like to invoke causality. And who can blame us? Causal inference comes with an impressive toolkit: directed acyclic graphs, potential outcomes, mediation analysis, formal identification results. It feels crisp. It feels reproducible. It feels like science.
But there is a precondition to the entire enterprise that we almost always skip past: you need well-defined causal variables. And defining those variables is not part of causal inference. It is prior to it — a subjective, pre-formal step that the formalism cannot provide and cannot validate.
Once you take this seriously, the consequences are severe. Every choice of variables induces a different hypothesis space. Every hypothesis space you didn't choose is one you can't say anything about. And the space of possible causal models compatible with any given phenomenon is not merely vast in the familiar senses — not just combinatorial over DAGs, or over the space of all possible parameterizations — but over all possible variable definitions, which is almost incomprehensibly vast. [...]
---
Outline:
(02:03) The variable definition problem
(05:02) We keep getting confused about interventions
(06:15) Shoe leather for LLM interpretability
(08:12) Trying to practice what Im preaching
(09:02) The punchline
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongMechanistic interpretability needs its own shoe leather era. Reproducing the labeling process will matter more than reproducing the Github.
Crossposted from Communication & Intelligence substack
When we try to understand large language models, we like to invoke causality. And who can blame us? Causal inference comes with an impressive toolkit: directed acyclic graphs, potential outcomes, mediation analysis, formal identification results. It feels crisp. It feels reproducible. It feels like science.
But there is a precondition to the entire enterprise that we almost always skip past: you need well-defined causal variables. And defining those variables is not part of causal inference. It is prior to it — a subjective, pre-formal step that the formalism cannot provide and cannot validate.
Once you take this seriously, the consequences are severe. Every choice of variables induces a different hypothesis space. Every hypothesis space you didn't choose is one you can't say anything about. And the space of possible causal models compatible with any given phenomenon is not merely vast in the familiar senses — not just combinatorial over DAGs, or over the space of all possible parameterizations — but over all possible variable definitions, which is almost incomprehensibly vast. [...]
---
Outline:
(02:03) The variable definition problem
(05:02) We keep getting confused about interventions
(06:15) Shoe leather for LLM interpretability
(08:12) Trying to practice what Im preaching
(09:02) The punchline
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,326 Listeners

130 Listeners

7,242 Listeners

559 Listeners

16,321 Listeners

4 Listeners

14 Listeners

2 Listeners