April 09, 2023

LW - Agentized LLMs will change the alignment landscape by Seth Herd

5 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Agentized LLMs will change the alignment landscape, published by Seth Herd on April 9, 2023 on LessWrong.

Epistemic status: head spinning, suddenly unsure of everything in alignment. And unsure of these predictions.

I'm following the suggestions in 10 reasons why lists of 10 reasons might be a winning strategy in order to get this out quickly (reason 10 will blow your mind!). I'm hoping to prompt some discussion, rather than try to do the definitive writeup on this topic when this technique was introduced so recently.

Ten reasons why agentized LLMs will change the alignment landscape:

Agentized LLMs like Auto-GPT and Baby AGI may fan the sparks of AGI in GPT-4 into a fire. These techniques use an LLM as a central cognitive engine, within a recursive loop of breaking a task goal into subtasks, working on those subtasks (including calling other software), and using the LLM to prioritize subtasks and decide when they're adequatly well done. They recursively check whether they're making progress on their top-level goal.

While it remains to be seen what these systems can actually accomplish, I think it's very likely that they will dramatically enhance the effective intelligence of the core LLM. I think this type of recursivity and breaking problems into separate cognitive tasks is central to human intelligence. This technique adds several key aspects of human cognition; executive function; reflective, recursive thought; and episodic memory for tasks, despite using non-brainlike implementations. To be fair, the existing implementations seem pretty limited and error-prone. But they were implemented in days. So this is a prediction of near-future progress, not a report on amazing new capabilities.

This approach appears to be easier than I'd thought. I've been expecting this type of self-prompting to imitate the advantages of human thought, but I didn't expect the cognitive capacities of GPT-4 to make it so easy to do useful multi-step thinking and planning. The ease of initial implementation (something like 3 days, with all of the code also written by GPT-4 for baby AGI) implies that improvements may also be easier than we would have guessed.

Integration with HuggingGPT and similar approaches can provide these cognitive loops with more cognitive capacities. This integration was also easier than I'd have guessed, with GPT-4 learning from a handful (e.g., 40) of examples how to use other software tools. Those tools will include both sensory capacities, with vision models and other sensory models of various types, and the equivalent of a variety of output capabilities.

Integration of recursive LLM self-improvement like "Reflexion" can utilize these cognitive loops to make the core model better at a variety of tasks.

Easily agentized LLMs is terrible news for capabilities. I think we'll have an internet full of LLM-bots "thinking" up and doing stuff within a year.

This is absolutely bone-chilling for the urgency of the alignment and coordination problems. Some clever chucklehead already created ChaosGPT, an instance of Auto-GPT given the goal to destroy humanity and create chaos. You are literally reading the thoughts of something thinking about how to kill you. It's too stupid to get very far, but it will get smarter with every LLM improvement, and every improvement to the recursive self-prompting wrapper programs. This gave me my very first visceral fear of AGI destroying us. I recommend it, unless you're already plenty viscerally freaked out.

Watching agents think is going to shift public opinion. We should be ready for more AI scares and changing public beliefs. I have no idea how this is going to play out in the political sphere, but we need to figure this out to have a shot at successful alignment, because

We will be in a multilateral AGI world. Anyone can spawn a dumb AGI...

...more