Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some costs of superposition, published by Linda Linsefors on March 3, 2024 on The AI Alignment Forum.
I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing.
In this post I'm defining superposition as: A representation with more features than neurons, achieved by encoding the features as almost orthogonal vectors in neuron space.
One reason to expect superposition in neural nets (NNs), is that for large n, Rn has many more than n almost orthogonal directions. On the surface, this seems obviously useful for the NN to exploit. However, superposition is not magic. You don't actually get to put in more information, the gain you get from having more feature directions has to be paid for some other way.
All the math in this post is very hand-wavey. I expect it to be approximately correct, to one order of magnitude, but not precisely correct.
Sparsity
One cost of superposition is feature activation sparsity. I.e, even though you get to have many possible features, you only get to have a few of those features simultaneously active.
(I think the restriction of sparsity is widely known, I mainly include this section because I'll need the sparsity math for the next section.)
In this section we'll assume that each feature of interest is a boolean, i.e. it's either turned on or off. We'll investigate how much we can weaken this assumption in the next section.
If you have m features represented by n neurons, with m>n, then you can't have all the features represented by orthogonal vectors. This means that an activation of one feature will cause some some noise in the activation of other features.
The typical noise on feature f1 caused by 1 unit of activation from feature f2, for any pair of features f1, f2, is (derived from Johnson-Lindenstrauss lemma)
ϵ=8ln(m)n [1]
If l features are active then the typical noise level on any other feature will be approximately ϵl units. This is because the individual noise terms add up like a random walk. Or see here for an alternative explanation of where the root square comes from.
For the signal to be stronger than the noise we need ϵl<1, and preferably ϵl1.
This means that we can have at most l