The Daily ML

Ep42. The Geometry of Concepts: Sparse Autoencoder Feature Structure


Listen Later

This research paper investigates the structure of the concept universe represented by large language models (LLMs), specifically focusing on how sparse autoencoders (SAEs) can be used to discover and analyze concepts within these models. The authors explore this structure at three distinct scales: the “atomic” scale, where they look for geometric patterns representing semantic relationships between concepts; the “brain” scale, where they identify clusters of features that tend to fire together within a document and are spatially localized; and the "galaxy" scale, where they examine the overall shape and clustering of the feature space. The authors find that the concept universe exhibits a surprising degree of structure, suggesting that SAEs can be a powerful tool for understanding the inner workings of LLMs.
...more
View all episodesView all episodes
Download on the App Store

The Daily MLBy The Daily ML