April 21, 2023

LW - The Agency Overhang by Jeffrey Ladish

9 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Agency Overhang, published by Jeffrey Ladish on April 21, 2023 on LessWrong.

As language models become increasingly capable of performing cognitive tasks that humans can solve quickly, it appears that a "cognitive capability : agency overhang" is emerging. We have powerful systems that currently have little ability to carry out complex, multi-step plans, but at some point, these powerful yet not-very-agentic systems may develop sophisticated planning and execution abilities. Since "fast cognition capability : agency overhang" is unwieldy, I will shorten this to “agency overhang”.

By agency, I mean the ability to generate and carry out complex plans to do specific things, like run a software company, or run a scientific research program investigating cancer treatments. I think of a system as “more agentic” when it can carry out more complex plans that take more steps to accomplish.

It’s hard to estimate how quickly planning and execution abilities could be developed from state of the art (SOTA) language models, but there is some risk these abilities could develop quickly given the right training environment or programmatic scaffolding (e.g. something like AutoGPT). This could look like a sharp left turn that happens very suddenly during training, or it could look like a smoother-but-still-fast development taking weeks or months. My claim is that any of these relatively fast transitions from “systems with superhuman cognitive abilities on short time horizon tasks but poor planning and execution ability” to “systems that have these abilities plus impressive planning and execution ability” would be very dangerous. Not only because rapid gains in cognitive capabilities are generally risky, but because people might underestimate how quickly models could gain the dangerous planning and execution abilities.

Below I discuss how people are experimenting with making large language models more agentic through the use of programmatic scaffolding. Before I do, I want to emphasize the more general point that an agency overhang is concerning because the risks from an overhang don’t depend on exactly how the agentic-capabilities gap gets closed, only that it does. We currently possess powerful cognitive engines that perform superhumanly well along many dimensions but not all dimensions of general intelligence. At some point we will close this gap by finding ways to train or program AI systems to have the capabilities they need to plan and execute and iterate well. This may be through the current scaling regime using similar internet training data. It may be through utilizing new training techniques, architectures, or types of training data. It may be through building new types of scaffolding.

Regardless, there is significant risk that AI systems could cross this threshold quickly, and go from relatively harmless to extremely dangerous in a short amount of time.

Once AI systems with superhuman capabilities along many dimensions close the agency gap and are able to plan and execute well, they will be much more capable of acquiring more financial, social, and computational resources, manipulating large groups of humans, and recursively self-improving.

An agency overhang is bad in that it makes experimenting with building agentic systems much more dangerous. I’m not against building agents to study how to align them. I think we'll need to study agency empirically before we can solve alignment. But we should do this extremely carefully, with significant security measures and information management (don't publish AI agent capability results) starting with less powerful models and lots of interpretability tools. In an ideal world, we would study many different types of weak agentic systems this way, look at their internals, test them in lots of different kinds of environments, and come ...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

April 21, 2023

LW - The Agency Overhang by Jeffrey Ladish

9 minutes

Regardless, there is significant risk that AI systems could cross this threshold quickly, and go from relatively harmless to extremely dangerous in a short amount of time.

...more

Share LW - The Agency Overhang by Jeffrey Ladish

Sign up to save your podcasts

LW - The Agency Overhang by Jeffrey Ladish

LW - The Agency Overhang by Jeffrey Ladish