
Sign up to save your podcasts
Or


TLDR;
In previous work, we found a problematic form a feature splitting called "feature absorption" when analyzing Gemma Scope SAEs. We hypothesized that this was due to SAEs struggling to separate co-occurrence between features, but we did not prove this. In this post, we set up toy models where we can explicitly control feature representations and co-occurrence rates and show the following:
All code for this post can be seen in this Colab notebook.
The rest of this post will assume [...]
---
Outline:
(00:06) TLDR;
(01:18) What is feature absorption?
(02:15) How is this different than traditional feature splitting?
(03:47) Why does absorption happen?
(04:17) How big of a problem is this, really?
(05:04) Toy Models of Feature Absorption Setup
(05:29) Non-superposition setup
(07:12) Superposition setup
(07:53) Perfect Reconstruction with Independent Features
(08:39) Feature co-occurrence causes absorption
(10:33) Magnitude variance causes partial absorption
(12:35) Why does partial absorption happen?
(13:13) Imperfect co-occurrence can still lead to absorption depending on L1 penalty
(16:03) Tying the SAE encoder and decoder weights solves feature absorption
(17:08) Absorption in superposition
(19:13) Tying the encoder and decoder weights still solves feature absorption in superposition.
(19:53) Future work
The original text contained 10 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongTLDR;
In previous work, we found a problematic form a feature splitting called "feature absorption" when analyzing Gemma Scope SAEs. We hypothesized that this was due to SAEs struggling to separate co-occurrence between features, but we did not prove this. In this post, we set up toy models where we can explicitly control feature representations and co-occurrence rates and show the following:
All code for this post can be seen in this Colab notebook.
The rest of this post will assume [...]
---
Outline:
(00:06) TLDR;
(01:18) What is feature absorption?
(02:15) How is this different than traditional feature splitting?
(03:47) Why does absorption happen?
(04:17) How big of a problem is this, really?
(05:04) Toy Models of Feature Absorption Setup
(05:29) Non-superposition setup
(07:12) Superposition setup
(07:53) Perfect Reconstruction with Independent Features
(08:39) Feature co-occurrence causes absorption
(10:33) Magnitude variance causes partial absorption
(12:35) Why does partial absorption happen?
(13:13) Imperfect co-occurrence can still lead to absorption depending on L1 penalty
(16:03) Tying the SAE encoder and decoder weights solves feature absorption
(17:08) Absorption in superposition
(19:13) Tying the encoder and decoder weights still solves feature absorption in superposition.
(19:53) Future work
The original text contained 10 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,335 Listeners

2,455 Listeners

8,555 Listeners

4,176 Listeners

97 Listeners

1,608 Listeners

10,020 Listeners

97 Listeners

522 Listeners

5,522 Listeners

15,942 Listeners

554 Listeners

133 Listeners

93 Listeners

472 Listeners