Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Arguments Against Strong Scaling, published by Joar Skalse on January 13, 2023 on The AI Alignment Forum.
There are many people who believe that we will be able to get to AGI by basically just scaling up the techniques used in recent large language models, combined with some relatively minor additions and/or architectural changes. As a result, there are people in the AI safety community who now predict timelines of less than 10 years, and structure their research accordingly. However, there are also people who still believe in long(er) timelines, or at least that substantial new insights or breakthroughts will be needed for AGI (even if those breakthroughts in principle could happen quickly). My impression is that the arguments for the latter position are not all that widely known in the AI safety community. In this post, I will summarise as many of these arguments as I can.
I will almost certainly miss some arguments; if so, I would be grateful if they could be added to the comments. My goal with this post is not to present a balanced view of the issue, nor is it to present my own view. Rather, my goal is just to summarise as many arguments as possible for being skeptical of short timelines and the "scaling is all you need" position.
This post is structured into four sections. In the first section, I give a rough overview of the scaling is all you need-hypothesis, together with a basic argument for that hypothesis. In the second section, I give a few general arguments in favour of significant model uncertainty when it comes to arguments about AI timelines. In the third section, I give some arguments against the standard argument for the scaling is all you need-hypothesis, and in the fourth section, I give a few direct arguments against the hypothesis itself. I then end the post on a few closing words.
Glossary:
LLM - Large Language ModelSIAYN - Scaling Is All You Need
The View I'm Arguing Against
In this section, I will give a brief summary of the view that these arguments oppose, as well as provide a standard justification for this view. In short, the view is that we can reach AGI by more or less simply scaling up existing methods (in terms of the size of the models, the amount of training data they are given, and/or the number of gradient steps they take, etc). One version says that we can do this by literally just scaling up transformers, but the arguments will apply even if we relax this to allow scaling of large deep learning-based next-token predictors, even if they would need be given a somewhat different architecture, and even if some extra thing would be needed, etc.
Why believe this? One argument goes like this:
(1) Next-word prediction is AI complete. This would mean that if we can solve next-word prediction, then we would also be able to solve any other AI problem. Why think next-word prediction is AI complete? One reason is that human-level question answering is believed to be AI-complete, and this can be reduced to next-word prediction.
(2) The performance of LLMs at next-word prediction improves smoothly as a function of the parameter count, training time, and amount of training data. Moreover, the asymptote of this performance trend is on at least human performance.
() Hence, if we keep scaling up LLMs we will eventually reach human-level performance at next-word prediction, and therefore also reach AGI.
An issue with this argument, as stated, is that GPT-3 already is better than humans at next-word prediction. So are both GPT-2 and GPT-1, in fact, see this link. This means that there is an issue with the argument, and that issue is that human-level performance on next-word prediction (in terms of accuracy) evidently is insufficient to attain human-level performance in question answering.
There are at least two ways to amend the argument:
(3) In...