
Sign up to save your podcasts
Or


---
Outline:
(02:07) Background and motivation
(04:38) Solution: transcoders
(06:16) Performance metrics
(10:45) Qualitative interpretability analysis
(10:50) Example transcoder features
(11:34) Broader interpretability survey
(12:28) Circuit analysis
(13:24) Input-independent information: pullbacks and de-embeddings
(15:38) Input-dependent information
(18:20) Obtaining circuits and graphs
(19:42) Brief discussion: why are transcoders better for circuit analysis?
(21:39) Case study
(22:05) Introduction to blind case studies
(23:55) Blind case study on layer 8 transcoder feature 355
(28:44) Evaluating our hypothesis
(29:22) Code
(31:43) Discussion
(35:13) Author contribution statement
(35:45) Appendix
(35:48) For input-dependent feature connections, why pointwise-multiply the feature activation vector with the pullback vector?
(37:09) Comparing input-independent pullbacks with mean input-dependent attributions
(39:46) A more detailed description of the computational graph algorithm
(43:14) Details on evaluating transcoders
The original text contained 11 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrong---
Outline:
(02:07) Background and motivation
(04:38) Solution: transcoders
(06:16) Performance metrics
(10:45) Qualitative interpretability analysis
(10:50) Example transcoder features
(11:34) Broader interpretability survey
(12:28) Circuit analysis
(13:24) Input-independent information: pullbacks and de-embeddings
(15:38) Input-dependent information
(18:20) Obtaining circuits and graphs
(19:42) Brief discussion: why are transcoders better for circuit analysis?
(21:39) Case study
(22:05) Introduction to blind case studies
(23:55) Blind case study on layer 8 transcoder feature 355
(28:44) Evaluating our hypothesis
(29:22) Code
(31:43) Discussion
(35:13) Author contribution statement
(35:45) Appendix
(35:48) For input-dependent feature connections, why pointwise-multiply the feature activation vector with the pullback vector?
(37:09) Comparing input-independent pullbacks with mean input-dependent attributions
(39:46) A more detailed description of the computational graph algorithm
(43:14) Details on evaluating transcoders
The original text contained 11 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners