July 22, 2023

LW - BCIs and the ecosystem of modular minds by beren

23 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: BCIs and the ecosystem of modular minds, published by beren on July 21, 2023 on LessWrong.

Crossposted from my personal blog.

Epistemic status: Much more speculative than previous posts but points towards an aspect of the future that is becoming clearer which I think is underappreciated at present. If you are interested in any of these thoughts please reach out.

For many years, the primary AI risk model was one of rapid take-off (FOOM) of a single AI entering a recursive self-improvement loop and becoming utterly dominant over humanity. There were lots of debates about whether this 'fast-takeoff' model was correct or whether instead we would enter a slow-takeoff regime. In my opinion, the evidence is pretty definitive that at the moment we are entering a slow-takeoff regime, and arguably have been in it for the last few years (historically takeoff might be dated to the release of GPT-3).

The last few years have undoubtedly been years of scaling monolithic very large models. The primary mechanism of improvement has been increasing the size of a monolithic general model. We have discovered that a single large model can outperform many small, specialized models on a wide variety of tasks. This trend is especially strong for language models. We also see a similar trend in image models and other modalities where large transformer or diffusion architectures work extremely well and scaling them up in both parameter size and data leads to large and predictable gains. However, soon this scaling era will necessarily come to an end temporarily. This is necessary because the size of training runs and models is rapidly exceeding what companies can realistically spend on compute (and what NVIDIA can produce). GPT-4 training cost at least 100m. It is likely that GPT-5, or a successor run in the next few years will cost >1B.

At this scale, only megacap tech companies can afford another OOM and beyond that there is only powerful nation-states, which seem to be years away. Other modalities such as visual and audio have several more OOMs of scaling to go yet but if the demand is there they can also be expended in a few years. More broadly, scaling up model training is now a firmly understood process and has moved from a science to engineering and there now exist battle-tested libraries (both internal to companies and somewhat open-source) which allow for large scale training runs to be primarily bottlenecked by hardware and not by sorting out the software and parallelism stack.

Beyond a-priori considerations, there are also some direct signals. Sam Altman recently said that scaling will not be the primary mechanism for improvement in the future. Other researchers have expressed similar views. Of course scaling will continue well into the future, and there are also many low hanging fruit in efficiency improvements to be made, both in terms of parameter efficiency and data efficiency. However, if we do not reach AGI in the next few years, then it seems increasingly likely that we will not reach AGI in the near-future simply by scaling.

If this is true, we will move into a slow takeoff world. AI technology will still improve, but will become much more democratized and distributed than at present. Many companies will catch up to the technological frontier and foundation model inference and even training will increasingly become a commodity. More and more of the economy will be slowly automated, although there will be a lot of lag here simply due to the large amount of low-hanging fruit, the need for maturity of the underlying software stack and business models, and simply that things progress slowly in the real world. AI progress will look a lot more like electrification (as argued by Scott Alexander) than like nuclear weapons or some other decisive technological breakthrough.

What will be...

...more