Paper Talk

222-InterPLM: Interpretable Protein Language Models


Listen Later

The paper introduces "InterPLM," a systematic framework for interpreting protein language models (PLMs) using sparse autoencoders (SAEs). This method successfully extracts thousands of interpretable features from PLMs like ESM-2, revealing biological concepts such as binding sites and functional domains that are stored in superposition within the model's neurons. The research demonstrates that SAE features show significantly stronger alignment with known biological annotations than individual neurons and that larger PLMs capture a broader range of concepts. Furthermore, the framework leverages large language models for automated feature description and validation, showing that feature activations can identify missing database annotations and enable the targeted steering of sequence generation.

References:

  • Simon E, Zou J. Interplm: Discovering interpretable features in protein language models via sparse autoencoders, 2024[J]. URL arxiv. org/abs/2412.12101.
...more
View all episodesView all episodes
Download on the App Store

Paper TalkBy 淼淼Elva