
Sign up to save your podcasts
Or
This is an interim report that we are currently building on. We hope this update will be useful to related research occurring in parallel. Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort
Executive Summary
---
Outline:
(00:32) Executive Summary
(05:57) Technique: Attention Head Attribution via Attention Layer SAEs
(07:54) Overview of Attention Heads Across Layers
(11:03) Investigating Attention Head Polysemanticity
(13:28) Discovering Plausibly Monosemantic Heads
(14:50) Case Study: Long Prefix Induction Head
(20:18) Appendix: Attention Heads Feature Map
(20:42) Citing this work
(20:55) Author Contributions Statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
This is an interim report that we are currently building on. We hope this update will be useful to related research occurring in parallel. Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort
Executive Summary
---
Outline:
(00:32) Executive Summary
(05:57) Technique: Attention Head Attribution via Attention Layer SAEs
(07:54) Overview of Attention Heads Across Layers
(11:03) Investigating Attention Head Polysemanticity
(13:28) Discovering Plausibly Monosemantic Heads
(14:50) Case Study: Long Prefix Induction Head
(20:18) Appendix: Attention Heads Feature Map
(20:42) Citing this work
(20:55) Author Contributions Statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,438 Listeners
15,220 Listeners
474 Listeners
121 Listeners
75 Listeners
461 Listeners