
Sign up to save your podcasts
Or


Two Minute Summary
In this post I present my results from training a Sparse Autoencoder (SAE) on a CLIP Vision Transformer (ViT) using the ImageNet-1k dataset. I have created an interactive web app, 'SAE Explorer', to allow the public to explore the visual features the SAE has learnt, found here: https://sae-explorer.streamlit.app/ (best viewed on a laptop). My results illustrate that SAEs can identify sparse and highly interpretable directions in the residual stream of vision models, enabling inference time inspections on the model's activations. To demonstrate this, I have included a 'guess the input image' game on the web app that allows users to guess the input image purely from the SAE activations of a single layer and token of the residual stream. I have also uploaded a (slightly outdated) accompanying talk of my results, primarily listing SAE features I found interesting: https://youtu.be/bY4Hw5zSXzQ.
The primary purpose of this post is [...]
---
Outline:
(00:08) Two Minute Summary
(02:44) Motivation
(04:18) What is a Vision Transformer?
(06:01) What is CLIP?
(07:42) Training the SAE
(08:43) Examples of SAE Features
(09:04) Interesting and Amusing Features
(09:19) Era/Time Features:
(09:48) Place/Culture Features:
(10:24) Film/TV Features:
(10:41) Texture Features:
(10:49) Miscellaneous Features:
(11:07) NSFW Features:
(11:47) How Trustworthy are Highest Activating Images?
(12:27) Tennis Feature
(12:46) Border Terrier Feature
(12:52) Mushrooms Feature
(12:58) Birds on Branches/in Foliage
(13:05) Training Performance
(13:08) Sparsity and _l_0_
(13:56) MSE, _l_1_ and Model Losses
(14:26) Identifying the Ultra-Low Density Cluster
(18:58) Neuron Alignment
(21:56) Future Work
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongTwo Minute Summary
In this post I present my results from training a Sparse Autoencoder (SAE) on a CLIP Vision Transformer (ViT) using the ImageNet-1k dataset. I have created an interactive web app, 'SAE Explorer', to allow the public to explore the visual features the SAE has learnt, found here: https://sae-explorer.streamlit.app/ (best viewed on a laptop). My results illustrate that SAEs can identify sparse and highly interpretable directions in the residual stream of vision models, enabling inference time inspections on the model's activations. To demonstrate this, I have included a 'guess the input image' game on the web app that allows users to guess the input image purely from the SAE activations of a single layer and token of the residual stream. I have also uploaded a (slightly outdated) accompanying talk of my results, primarily listing SAE features I found interesting: https://youtu.be/bY4Hw5zSXzQ.
The primary purpose of this post is [...]
---
Outline:
(00:08) Two Minute Summary
(02:44) Motivation
(04:18) What is a Vision Transformer?
(06:01) What is CLIP?
(07:42) Training the SAE
(08:43) Examples of SAE Features
(09:04) Interesting and Amusing Features
(09:19) Era/Time Features:
(09:48) Place/Culture Features:
(10:24) Film/TV Features:
(10:41) Texture Features:
(10:49) Miscellaneous Features:
(11:07) NSFW Features:
(11:47) How Trustworthy are Highest Activating Images?
(12:27) Tennis Feature
(12:46) Border Terrier Feature
(12:52) Mushrooms Feature
(12:58) Birds on Branches/in Foliage
(13:05) Training Performance
(13:08) Sparsity and _l_0_
(13:56) MSE, _l_1_ and Model Losses
(14:26) Identifying the Ultra-Low Density Cluster
(18:58) Neuron Alignment
(21:56) Future Work
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners