March 30, 2025

How AI and LLM Models Think -Robots Talking EP-23Robots Talking

18 minutes

This paper introduces transcoders, a novel method for analyzing the internal computations of large language models (LLMs) by creating sparse approximations of their MLP sublayers. Transcoders learn a wider, sparsely activating MLP to mimic a denser layer, enabling a clearer factorization of model behavior into input-dependent activations and input-invariant weight relationships. The authors demonstrate that transcoders are comparable to or better than sparse autoencoders (SAEs) in interpretability, sparsity, and faithfulness. By applying transcoders to circuit analysis, the research uncovers interpretable subcomputations responsible for specific LLM capabilities, including a detailed examination of the "greater-than circuit" in GPT2-small.

...more

View all episodes

By mstraton8112

March 30, 2025

How AI and LLM Models Think -Robots Talking EP-23Robots Talking

18 minutes

...more

Share How AI and LLM Models Think -Robots Talking EP-23Robots Talking

Sign up to save your podcasts

How AI and LLM Models Think -Robots Talking EP-23Robots Talking

How AI and LLM Models Think -Robots Talking EP-23Robots Talking