The Nonlinear Library

LW - Against LLM Reductionism by Erich Grunewald


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Against LLM Reductionism, published by Erich Grunewald on March 8, 2023 on LessWrong.
Summary
Large language models (henceforth, LLMs) are sometimes said to be "just" shallow pattern matchers, "just" massive look-up tables or "just" autocomplete engines. These comparisons amount to a form of (methodological) reductionism. While there's some truth to them, I think they smuggle in corollaries that are either false or at least not obviously true.
For example, they seem to imply that what LLMs are doing amounts merely to rote memorisation and/or clever parlour tricks, and that they cannot generalise to out-of-distribution data. In fact, there's empirical evidence that suggests that LLMs can learn general algorithms and can contain and use representations of the world similar to those we use.
They also seem to suggest that LLMs merely optimise for success on next-token prediction. It's true that LLMs are (mostly) trained on next-token prediction, and it's true that this profoundly shapes their output, but we don't know whether this is how they actually function. We also don't know what sorts of advanced capabilities can or cannot arise when you train on next-token prediction.
So there's reason to be cautious when thinking about LLMs. In particular, I think, caution should be exercised (1) when making predictions about what LLMs will or will not in future be capable of and (2) when assuming that such-and-such a thing must or cannot possibly happen inside an LLM.
Pattern Matchers, Look-up Tables, Stochastic Parrots
My understanding of what goes on inside machine learning (henceforth, ML) models, and LLMs in particular, is still in many ways rudimentary, but it seems clear enough that, however tempting that is to imagine, it's little like what goes on in the minds of humans; it's weirder than that, more alien, more eldritch. As LLMs have been scaled up, and more compute and data have been poured into models with more parameters, they have undergone qualitative shifts, and are now capable of a range of tasks their predecessors couldn't even grasp, let alone fail at, even as they have retained essentially the same architecture and training process.[1] How do you square their awesome, if erratic, brilliance with the awareness that their inner workings are so ordinary?
One route would be to directly deny the brilliance. Gary Marcus does this, pointing out, and relishing in, the myriad ways that LLMs misfire. Their main limits are, he says, that they are unreliable and untruthful. (See the footnote for my thoughts on that.[2])
That's one route, but it's not the one I want to discuss here. The route I want to discuss here is to dispel the magic, so to speak: to argue that what goes on inside LLMs is "shallow", and that LLMs lack "understanding". This often takes the form of asserting that LLMs are just doing pattern matching[3], or just rephrasing material from the web[4], amounting to mere stochastic parrots[5], or just retrieving things from a massive look-up table. Gary Marcus describes the underlying problem as one of "a lack of cognitive models of the world":
The improvements, such as they are, come primarily because the newer models have larger and larger sets of data about how human beings use word sequences, and bigger word sequences are certainly helpful for pattern matching machines. But they still don't convey genuine comprehension, and so they are still very easy [...] to break.
Well -- in a certain light and for the sake of fairness -- this view is not entirely wrong:
LLMs are, in a sense, pattern matching. They likely have a great deal of attention heads and neurons and whatever that detect certain patterns in the input, which then help determine the model's output.
LLMs are, in a sense, merely rephrasing material from the web. All, or nearly all, of the data that th...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings