The Nonlinear Library

AF - Paper Replication Walkthrough: Reverse-Engineering Modular Addition by Neel Nanda


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper Replication Walkthrough: Reverse-Engineering Modular Addition, published by Neel Nanda on March 12, 2023 on The AI Alignment Forum.
I'm excited about trying different formats for mechanistic interpretability education! I've made a video walkthrough where we replicate my paper, Progress Measures for Grokking via Mechanistic Interpretability. With Jess Smith, one of my co-authors, we record ourselves coding a replication and discussed what we did at each step. This is a three part walkthrough and you can see the accompanying code for the walkthrough here:
In part 1, we train a model to perform modular addition, and see that it does grok!
In part 2, we take this model and reverse-engineer the trig-based circuit it's learned to do modular addition. We show that you can both read out intermediate steps of the circuit from the activations, and that you can just read off some of the algorithm's steps from the model weights.
In part 3, we define some progress measures that let us distinguish progress towards the generalising and the memorising algorithm. We then look at the model during training and watch how the circuits develop, and use this to understand why it groks.
This is an experiment with a new format, and I'd love to hear about how useful you find it!
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings