Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On the future of language models, published by Owen Cotton-Barratt on December 21, 2023 on The Effective Altruism Forum.
1. Introduction
1.1 Summary of key claims
Even without further breakthroughs in AI, language models will have big impacts in the coming years, as people start sorting out proper applications
The early important applications will be automation of expert advisors, management, and perhaps software development
The more transformative but harder prizes are automation of research and automation of executive capacity
In their most straightforward form ("foundation models"), language models are a technology which naturally scales to something in the vicinity of human-level (because it's about emulating human outputs), not one that naturally shoots way past human-level performance
i.e. it is a mistake-in-principle to imagine projecting out the GPT-2 - GPT-3 - GPT-4 capability trend into the far-superhuman range
Although they're likely to be augmented by things which accelerate progress, this still increases the likelihood of a relatively slow takeoff - several years (rather than weeks or months) of transformative growth before truly wild things are happening seems plausible
NB version of "speed superintelligence" could still be transformative even while performance on individual tasks is still firmly human level
There are two main techniques which can be used (probably in conjunction) to get language models to do more powerful things than foundation models are capable of:
Scaffolding: structured systems to provide appropriate prompts, including as a function of previous answers
Finetuning: altering model weights to select for task performance on a particular task
Each of these techniques has a path to potentially scale to strong superintelligence; alternatively language models might at some point be obsoleted by another form of AI
Timelines for any of these things seem pretty unclear
From a safety perspective, language model agents whose agency comes from scaffolding look greatly superior than ones whose agency comes from finetuning
Because you can get an extremely high degree of transparency by construction
Finetuning is more likely an important tool for instilling virtues (e.g. honesty) in systems
Sutton's Bitter Lesson raises questions for this strategy, but needn't mean it's doomed to be outcompeted
On the likely development trajectory there are a number of distinct existential risks
e.g. guarding against takeover from early language model agents is pretty different from differential technological development to ensure that we automate safety-enhancing research before risk-increasing research
The current portfolio of work on AI risk is over-indexed on work which treats "transformative AI" as a black box and tries to plan around that. I think that we can and should be peering inside that box (and this may involve plans targeted at more specific risks).
1.2 Meta
We know that AI is likely to be a very transformative technology. But a lot of the analysis of this point treats something like "AGI" as a black box, without thinking too much about the underlying tech which gets there. I think that's a useful mode, but it's also helpful to look at specific forms of AI technology and ask where they're going and what the implications are.
This doc does that for language models. It's a guide for thinking about them from various angles with an eye to what the strategic implications might be. Basically I've tried to write the thing I wish I'd read a couple of years ago; I'm sharing now in case it's helpful for others.
The epistemic status of this is "I thought pretty hard about this and these are my takes"; I'm sure there are still holes in my thinking (NB I don't actually do direct work with language models), and I'd appreciate pushback; but I'm also pretty sure I'm ...