Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exploring the Lottery Ticket Hypothesis, published by Rauno Arike on April 25, 2023 on LessWrong.
I have recently been fascinated by the breadth of important mysteries in deep learning, including deep double descent and phase changes, that could be explained by a curious conjectured property of neural networks called the lottery ticket hypothesis. Despite this explanatory potential, however, I haven't seen much discussion about the evidence behind and the implications of this hypothesis in the alignment community. Being confused about these things motivated me to conduct my own survey of the phenomenon, which resulted in this post.
The Lottery Ticket Hypothesis, explained in one minute
The lottery ticket hypothesis (LTH) was originally proposed in a paper by Frankle and Carbin (2018):
A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations.
The authors call such subnetworks "winning lottery tickets". As the simplest example, they train a LeNet-300-100 model on the MNIST dataset and report that a network containing only 21.1% of the weights of the dense version reaches a higher test accuracy on less training iterations, while a network where only 3.6% of the weights remain performs almost identically to the dense network of the network.
The lottery ticket hypothesis extends a long line of work in neural network pruning, a technique proposed in as early as 1990 by LeCun et al. Pruning simply means deleting some fraction of unimportant weights from the network after training to make inference more efficient. The vital insight of the lottery ticket hypothesis paper is that it may also be possible to prune the network before training to make both training and inference more efficient.
In practice, the method that Frankle and Carbin used for finding winning tickets didn't yet eliminate the need to train the full network, instead just suggesting the possibility. The technique they used is iterative pruning, a procedure that roughly looks as follows:
Train the full dense network on some classification task
Prune out some fraction of the weights with the smallest magnitude
Reinitialize the remaining weights to their original values
Repeat the same procedure for a number of times
This is computationally quite expensive, but as we'll see below, alternative approaches have been proposed later on.
The paper also defined a stronger version of the hypothesis that they named the lottery ticket conjecture:
We extend our hypothesis into an untested conjecture that SGD seeks out and trains a subset of well-initialized weights. Dense, randomly-initialized networks are easier to train than the sparse networks that result from pruning because there are more possible subnetworks from which training might recover a winning ticket.
In the next section, I'll argue that making a distinction between the hypothesis and the conjecture appears to be quite important.
Relevance to alignment
Phase changes
In his post about mechanistically interpreting grokking, Neel Nanda argues that the lottery ticket hypothesis may constitute the reason for why neural networks form sophisticated circuits.
One may naively think that neural networks are optimized in a similar way to how linear regression classifiers are optimized: each weight slowly changes in a direction that marginally improves performance, and the result of these tiny individual improvements smoothly improve the performance of the entire ensemble. In practice, though, we observe the forming of sophisticated circuits like the induction circuit, which is comprised of a previous token head and an induction head. Either of those heads improves loss only in the case the other ...