
Sign up to save your podcasts
Or
This is a linkpost for our two recent papers, produced at Apollo Research:
Not to be confused with Apollo's recent Sparse Dictionary Learning paper.
A key obstacle to mechanistic interpretability is finding the right representation of neural network internals. Optimally, we would like to derive our features from some high-level principle that holds across different architectures and use cases. At a minimum, we know two things:
---
First published:
Source:
Narrated by TYPE III AUDIO.
This is a linkpost for our two recent papers, produced at Apollo Research:
Not to be confused with Apollo's recent Sparse Dictionary Learning paper.
A key obstacle to mechanistic interpretability is finding the right representation of neural network internals. Optimally, we would like to derive our features from some high-level principle that holds across different architectures and use cases. At a minimum, we know two things:
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,388 Listeners
7,910 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners