Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Scaling laws vs individual differences, published by beren on January 10, 2023 on LessWrong.
Crossposted from my personal blog
Epistemic Status: This is a quick post on something I have been confused about for a while. If an answer to this is known, please reach out and let me know!
In ML we find that the performance of models tends towards some kind of power-law relationship between the loss and the amount of data in the dataset or the number of parameters of the model. What this means is, in effect, that to get a constant decrease in the loss, we need to increase either the data or the model or some combination of both by a constant factor. Power law scaling appears to occur for most models studied, including in extremely simple toy examples and hence appears to be some kind of fundamental property of how 'intelligence' scales, although for reasons that are at the moment quite unclear (at least to me -- if you know please reach out and tell me!)
Crucially, power law scaling is actually pretty bad and means that performance grows relatively slowly with scale. A model with twice as many parameters or twice as much data does not perform twice as well. These diminishing returns to intelligence are of immense importance for forecasting AI risks since whether FOOM is possible or not depends heavily on the returns to increasing intelligence in the range around the human level.
In biology, we also see power law scaling between species. For instance, there is a clear power law scaling curve relating the brain size of various species with roughly how 'intelligent', we think they are. Indeed, there are general cross-species scaling laws for intelligence and neuron count and density, with primates being on a superior scaling law to most other animals. These scaling laws are again slow. It takes a very significant amount of additional neurons or brain size to really move the needle on observed intelligence. We also see that brain size, unsurprisingly, is a very effective measure of the 'parameter count' at least within species which share the same neural density scaling laws.
However, on the inside view, we know there are significant differences in intellectual performance between humans. The differences in performance between tasks are also strongly correlated with each other, such that if someone is bad or good at one task, it is pretty likely that they will also be bad or good at another. If you analze many such intellectual tasks, and perform factor analysis you tend to get a single dominant factor, which is called the general intelligence factor g. Numerous studies have demonstrated that IQ is a highly reliable measure, is strongly correlated with performance measures such as occupational success, and that a substantial component of IQ is genetic. However, genetic variation of humans on key parameters such as brain size or neuron count, as well as data input, while extant, is very small compared to the logarithmic scaling law factors.
Natural human brain size variation does not range over 2x brain volume let alone a 10x or multiple order of magnitude difference. Under the scaling laws view, this would predict that individual differences in IQ between humans are very small, and essentially logarithmic on the loss.
However, at least from our vantage point, this is not what we observe. Individual differences between humans (and also other animals) appear to very strongly impact performance on an extremely wide range of 'downsteam tasks'. IQs at +3 standard deviations, despite their rarity in the population are responsible for the vast majority of intellectual advancement, while humans of IQ -3 standard deviations are extremely challenged with even simple intellectual tasks. This seems like a very large variation in objective performance which is not predicted by the scaling l...