LessWrong (30+ Karma)

“[Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders” by chanind, TomasD, hrdkbhatnagar, Joseph Bloom


Listen Later

This research was completed for London AI Safety Research (LASR) Labs 2024. The team was supervised by Joseph Bloom (Decode Research). Find out more about the programme and express interest in upcoming iterations here.

This high level summary will be most accessible to those with relevant context including an understanding of SAEs. The importance of this work rests in part on the surrounding hype, and potential philosophical issues. We encourage readers seeking technical details to read the paper on arxiv.

TLDR: This is a short post summarising the key ideas and implications of our recent work studying how character information represented in language models is extracted by SAEs. Our most important result shows that SAE latents can appear to classify some feature of the input, but actually turn out to be quite unreliable classifiers (much worse than linear probes). We think this unreliability is in part due to difference [...]

The original text contained 1 footnote which was omitted from this narration.

The original text contained 1 image which was described by AI.

---

First published:

September 25th, 2024

Source:

https://www.lesswrong.com/posts/3zBsxeZzd3cvuueMJ/paper-a-is-for-absorption-studying-feature-splitting-and

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

113,290 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,259 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

566 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,498 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners