Share 222-InterPLM: Interpretable Protein Language Models

Copy link

November 11, 2025

222-InterPLM: Interpretable Protein Language Models

21 minutes

The paper introduces "InterPLM," a systematic framework for interpreting protein language models (PLMs) using sparse autoencoders (SAEs). This method successfully extracts thousands of interpretable features from PLMs like ESM-2, revealing biological concepts such as binding sites and functional domains that are stored in superposition within the model's neurons. The research demonstrates that SAE features show significantly stronger alignment with known biological annotations than individual neurons and that larger PLMs capture a broader range of concepts. Furthermore, the framework leverages large language models for automated feature description and validation, showing that feature activations can identify missing database annotations and enable the targeted steering of sequence generation.

References:

Simon E, Zou J. Interplm: Discovering interpretable features in protein language models via sparse autoencoders, 2024[J]. URL arxiv. org/abs/2412.12101.

前往小宇宙评论区与主播互动

...more

View all episodes

By 淼淼Elva