
Sign up to save your podcasts
Or


We trained a SAE to find sparse features in image embeddings. We found many meaningful, interpretable, and steerable features. We find that steering image diffusion works surprisingly well and yields predictable and high-quality generations.
You can see the feature library here. We also have an intervention playground you can try.
Key Results
---
Outline:
(00:30) Key Results
(01:20) Interactive demo
(01:33) Introduction
(02:22) Steering Features
(04:22) Discovering and Interpreting Features
(06:22) Autointerpretation Labels
(07:13) Training Details
(08:34) Future Work
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongWe trained a SAE to find sparse features in image embeddings. We found many meaningful, interpretable, and steerable features. We find that steering image diffusion works surprisingly well and yields predictable and high-quality generations.
You can see the feature library here. We also have an intervention playground you can try.
Key Results
---
Outline:
(00:30) Key Results
(01:20) Interactive demo
(01:33) Introduction
(02:22) Steering Features
(04:22) Discovering and Interpreting Features
(06:22) Autointerpretation Labels
(07:13) Training Details
(08:34) Future Work
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,393 Listeners

130 Listeners

7,268 Listeners

529 Listeners

16,306 Listeners

4 Listeners

14 Listeners

2 Listeners