November 09, 2024

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure

13 minutes

This research paper investigates the structure of the concept universe represented by large language models (LLMs), specifically focusing on how sparse autoencoders (SAEs) can be used to discover and analyze concepts within these models. The authors explore this structure at three distinct scales: the “atomic” scale, where they look for geometric patterns representing semantic relationships between concepts; the “brain” scale, where they identify clusters of features that tend to fire together within a document and are spatially localized; and the "galaxy" scale, where they examine the overall shape and clustering of the feature space. The authors find that the concept universe exhibits a surprising degree of structure, suggesting that SAEs can be a powerful tool for understanding the inner workings of LLMs.

...more

View all episodes

By The Daily ML

November 09, 2024

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure

13 minutes

...more

Share Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure

Sign up to save your podcasts

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure