December 31, 2022

AF - 'simulator' framing and confusions about LLMs by Beth Barnes

6 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 'simulator' framing and confusions about LLMs, published by Beth Barnes on December 31, 2022 on The AI Alignment Forum.

Post status: pretty rough + unpolished, thought it might be worthwhile getting this out anywayI feel like I've encountered various people having misunderstandings of LLMs that seem to be related to using the 'simulator' framing. I'm probably being horrendously uncharitable to the people in question, I'm not confident that anyone actually holds any of the opinions that are outlined below, and even if they do I'm not sure that they're actually attributable to the simulators framing, but it seemed like it might be useful to point at areas of potential confusion.

In general I'm skeptical that the simulator framing adds much relative to 'the model is predicting what token would appear next in the training data given the input tokens'. I think it's pretty important to think about what exactly is in the training data, rather than about some general idea of accurately simulating the world.

Perfect predictors

I've encountered people thinking about idealized LLMs that have perfect predictive accuracy, suggesting that e.g. instead of using the model to help you hack into some system, you could just get it to emulate a terminal on that system then extract whatever info you wanted to extract. I think there are two issues here:

Thinking about it as 'you prompt it with some setting in the world, then it predicts this perfectly'

There's not a well-defined correct generalization unless this exact sequence of tokens was actually in the training data. (Paul has a post which talks about this 'what is actually the correct generalization' thing somewhere that I wanted to link, but I can't currently find it)

The 'correct generalization' in some sense is 'what would have followed this if it somehow was included in the training data' - which is not necessarily the 'real' version of the thing you're trying to predict. E.g. if you prompt it to get it to produce the output of some very expensive experiment that humans are unlikely to have actually run, then your model might predict what humans would have written if they'd put a made-up version of this in the training set rather than what would actually happen if you ran the experiment

I think that by the time you can use your model to give you detailed terminal outputs for a specific system, including passwords, entire model weights etc, a bunch of transformative things will already have happened, so it's not really worth thinking about this kind of thing.

Physics simulators

Relatedly, I've heard people reason about the behavior of current models as if they're simulating physics and going from this to predictions of which tokens will come next, which I think is not a good characterization of current or near-future systems. Again, my guess is that very transformative things will happen before we have systems that are well-understood as doing this.

Confusion about hallucinations

There's a specific subset of hallucination I refer to as 'offscreen text hallucination', where the model implies that the prompt contains some chunk of text that it doesn't. E.g., if you give it a prompt with some commands trying to download and view a page, and the output, it does things like say 'That output is a webpage with a description of X', when in fact the output is blank or some error or something.

Example prompt:

Please answer these questions about the blog post: What does the post say about the history of the field?

Completion:

I think this happens in part because the model has seen documents with missing text, where things were e.g. in an embedded image, or stripped out by the data processing, or whatever. This is different from other types of hallucinations, like: - hallucinating details about something but not implying it appears in the pr...

...more