The Nonlinear Library: Alignment Forum

AF - Goodbye, Shoggoth: The Stage, its Animatronics, and the Puppeteer - a New Metaphor by Roger Dearnaley


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer - a New Metaphor, published by Roger Dearnaley on January 9, 2024 on The AI Alignment Forum.
Thanks to
Quentin FEUILLADE--MONTIXI for the discussion in which we came up with this idea together, and for feedback on drafts.
TL;DR A better metaphor for how LLMs behave, how they are trained, and particularly for how to think about the alignment strengths and challenges of LLM-powered agents. This is informed by simulator theory - hopefully people will find it more detailed, specific, and helpful than the old shoggoth metaphor.
Humans often think in metaphors. A good metaphor can provide a valuable guide to intuition, or a bad one can mislead it. Personally I've found the shoggoth metaphor for LLMs rather useful, and it has repeatedly helped guide my thinking (as long as one remembers that the shoggoth is a shapeshifter, and thus a very contextual beast).

However, as posts like
Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? make clear, not everyone finds this metaphor very helpful (my reaction was "Of course it's masks all the way down - that's what the eyes symbolize! It's made of living masks: masks of people."). Which admittedly doesn't match
H.P. Lovecraft's description; perhaps it helps to have spent time playing around with base models in order to get to know the shoggoth a little better (if you haven't, I recommend it).
So, I thought I'd try to devise a more useful and detailed metaphor, one that was a better guide for intuition, especially for alignment issues. During
a conversation with
Quentin FEUILLADE--MONTIXI we came up with one together (the stage and its animatronics were my suggestions, the puppeteer was his, and we tweaked it together). I'd like to describe this, in the hope that other people find it useful (or else that they rewrite it until they find one that works better for them).
Along the way, I'll show how this metaphor can help illuminate a number of LLM behaviors and alignment issues, some well known, and others that seem to be less widely-understood.
A Base Model: The Stage and its Animatronics
A base-model LLM is like a magic stage. You construct it, then you read it or show it (at enormous length) a large proportion of the internet, and if you wish also books, scientific papers, images, movies, or whatever else you want. The stage is inanimate: it's not agentic, it's
goal agnostic (well, unless you want consider 'contextually guess the next token' to be a goal, but it's not going to cheat by finding a way to make the next token more predictable, because that wasn't possible during its training and it's not agentic enough to be capable of conceiving that that might even be possible outside it). No Reinforcement Learning (RL) was used in its training, so concerns around
Outer Alignment don't apply to it - we know exactly what its training objective was: guess next tokens right, just as we intended. We now even have
some mathematical idea what it's optimizing. Nor, as we'll discuss later, do concerns around deceit, situational awareness, or gradient hacking apply to it. At this point, it's
myopic,
tool AI: it doesn't know or care whether we or the material world even exist, it only cares about the distribution of sequences of tokens, and all it does is repeatedly contextually generate a guess of the next token. So it plays madlibs like a professional gambler, in the same blindly monomaniacal sense that a chess machine plays chess like a grandmaster. By itself, the only risk from it is the possibility that someone else might hack your computer network to steal its weights, and what they might then do with it.
Once you're done training the stage, you have a base model. Now you can flip its switch, tell the stage the title of a play, or bett...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners