
Sign up to save your podcasts
Or


In this post I'll discuss an apparent limitation of sparse autoencoders (SAEs) in their current formulation as they are applied to discovering the latent features within AI models such as transformer-based LLMs. In brief, I'll cover the following:
---
Outline:
(01:07) Rough definition of true features
(02:17) Why SAEs are incentivised to discover combinations of features rather than individual features
(08:33) Relation to feature splitting
(14:57) Proposed solution
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongIn this post I'll discuss an apparent limitation of sparse autoencoders (SAEs) in their current formulation as they are applied to discovering the latent features within AI models such as transformer-based LLMs. In brief, I'll cover the following:
---
Outline:
(01:07) Rough definition of true features
(02:17) Why SAEs are incentivised to discover combinations of features rather than individual features
(08:33) Relation to feature splitting
(14:57) Proposed solution
---
First published:
Source:
Narrated by TYPE III AUDIO.

113,164 Listeners

130 Listeners

7,255 Listeners

535 Listeners

16,266 Listeners

4 Listeners

14 Listeners

2 Listeners