
Sign up to save your podcasts
Or
---
Outline:
(02:07) Background and motivation
(04:38) Solution: transcoders
(06:16) Performance metrics
(10:45) Qualitative interpretability analysis
(10:50) Example transcoder features
(11:34) Broader interpretability survey
(12:28) Circuit analysis
(13:24) Input-independent information: pullbacks and de-embeddings
(15:38) Input-dependent information
(18:20) Obtaining circuits and graphs
(19:42) Brief discussion: why are transcoders better for circuit analysis?
(21:39) Case study
(22:05) Introduction to blind case studies
(23:55) Blind case study on layer 8 transcoder feature 355
(28:44) Evaluating our hypothesis
(29:22) Code
(31:43) Discussion
(35:13) Author contribution statement
(35:45) Appendix
(35:48) For input-dependent feature connections, why pointwise-multiply the feature activation vector with the pullback vector?
(37:09) Comparing input-independent pullbacks with mean input-dependent attributions
(39:46) A more detailed description of the computational graph algorithm
(43:14) Details on evaluating transcoders
The original text contained 11 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Outline:
(02:07) Background and motivation
(04:38) Solution: transcoders
(06:16) Performance metrics
(10:45) Qualitative interpretability analysis
(10:50) Example transcoder features
(11:34) Broader interpretability survey
(12:28) Circuit analysis
(13:24) Input-independent information: pullbacks and de-embeddings
(15:38) Input-dependent information
(18:20) Obtaining circuits and graphs
(19:42) Brief discussion: why are transcoders better for circuit analysis?
(21:39) Case study
(22:05) Introduction to blind case studies
(23:55) Blind case study on layer 8 transcoder feature 355
(28:44) Evaluating our hypothesis
(29:22) Code
(31:43) Discussion
(35:13) Author contribution statement
(35:45) Appendix
(35:48) For input-dependent feature connections, why pointwise-multiply the feature activation vector with the pullback vector?
(37:09) Comparing input-independent pullbacks with mean input-dependent attributions
(39:46) A more detailed description of the computational graph algorithm
(43:14) Details on evaluating transcoders
The original text contained 11 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners