
Sign up to save your podcasts
Or
Background
Sparse autoencoders recover a diversity of interpretable, monosemantic features, but present an intractable problem of scale to human labelers. We investigate different techniques for generating and scoring text explanations of SAE features.
Key Findings
Generating Explanations
Sparse autoencoders decompose activations into a sum of sparse feature directions. We leverage language models to generate explanations for activating text examples. Prior work [...]
The original text contained 3 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Background
Sparse autoencoders recover a diversity of interpretable, monosemantic features, but present an intractable problem of scale to human labelers. We investigate different techniques for generating and scoring text explanations of SAE features.
Key Findings
Generating Explanations
Sparse autoencoders decompose activations into a sum of sparse feature directions. We leverage language models to generate explanations for activating text examples. Prior work [...]
The original text contained 3 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,420 Listeners
2,387 Listeners
7,893 Listeners
4,126 Listeners
87 Listeners
1,458 Listeners
9,040 Listeners
87 Listeners
390 Listeners
5,431 Listeners
15,216 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners