The Nonlinear Library

AF - More findings on Memorization and double descent by Marius Hobbhahn


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: More findings on Memorization and double descent, published by Marius Hobbhahn on February 1, 2023 on The AI Alignment Forum.
Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort.
I’d like to thank Wes Gurnee, Aryan Bhatt, Eric Purdy and Stefan Heimersheim for discussions and Evan Hubinger, Neel Nanda, Adam Jermyn and Chris Olah for mentorship and feedback.
The post contains a lot of figures, so the suggested length is deceiving. Code can be found in these three colab notebooks [1][2][3].
I have split the post into two parts. The first one is concerned with double descent and other general findings in memorization and the second focuses on measuring memorization using the maximum data dimensionality metric. This is the first post in a series of N posts on memorization in transformers.
Executive summary
I look at a variety of settings and experiments to better understand memorization in toy models. My primary motivation is to increase our general understanding of NNs but I also suspect that understanding memorization better might increase our ability to detect backdoors/trojans.
The work heavily builds on two papers by Anthropic, “Toy models of superposition” and “Superposition, Memorization and double descent”. I successfully replicate a subset of their findings.
I specifically look at three different setups of NNs that I speculate are most relevant to understanding memorization in the non-attention parts of transformers.
Bottlenecks between layers, i.e. when projecting from high-dimensional spaces (e.g. MLPs) into lower dimensions (e.g. the residual stream). This is similar to the setting in the toy models of superposition paper and its sequel.
MLP blocks, i.e. when projecting from lower-dimensional spaces (e.g. the residual stream) into higher dimensions with ReLU non-linearities.
The final layer, i.e. when projecting from the end of the residual stream into the vocab space. The main difference to the previous scenarios is that we use the cross-entropy loss for the experiments which has a different inductive bias than the MSE loss.
I’m able to find the double descent phenomenon in all three settings. My takeaway from this is that the transition between memorization and learning general features seems to be a very regular and predictable phenomenon (assuming you know the sparsity and number of features of your network). Furthermore, it seems like the network is “confused” (e.g. has much higher test loss) when it is right between memorization and generalization.
I test the limits of reconstruction in different settings, i.e. the ability of the neural network to reconstruct its inputs given different dataset sizes, hidden sizes, number of features, importance distributions and sparsities. The findings mostly confirm what we would predict, e.g. more sparsity or larger hidden sizes lead to better reconstructions. A speculative claim is that if we had better measures of sparsity and importance in real-world models, we might be able to derive scaling laws that could tell us how many “concepts” a network has learned.
Interpreting NNs that memorized in the simplest settings is extremely straightforward--the network literally creates a dictionary that you can just read off the weights. However, even small increases in complexity make this dictionary much harder to read and I have not yet found a method to decompile it into a human-readable form (maybe in the next posts).
Isolated components
In the following, we isolate three settings that seem like important components of memorization. They are supposed to model the non-attention parts of a transformer (primarily because I speculate that memorization mostly happens in the non-attention parts).
Bottleneck
By bottleneck we mean a situation in which a model projects from many into fewer dimensi...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings