
Sign up to save your podcasts
Or
Share
Share Brad DeLong's Grasping Reality
If you start from the premise that a language model like ChatGPT is a very flexible, very high dimensional, very big data regression-and-classification engine best seen as a function from the domain of word-strings to the range of continuation words, I think a large number of things become clear.
First, because its training dataset is sparse in its potential domain—nearly all even moderate-length word-sequences that are not boilerplate or cliché are unique—its task is one of interpolation: take word-sequences “close” to the prompt, examine their continuations, and average them. Thus while pouring more and more resources into the engine does get you, potentially, a finer and finer interpolation, it seems highly likely that this process will have limits rather than grow to the sky, and it is better to look at it as an engine summarizing what humans typically say in analogous linguistic situations rather than any form of “thinking”.
I think the post-ChatGPT3 history of LLMs bears this out:
Sebastian Raschka: The State of Reinforcement Learning for LLM Reasoning <https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training>: ‘Releases of new flagship models like GPT-4.5 and Llama 4…. Reactions to these releases were relatively muted…. The muted response… suggests we are approaching the limits of what scaling model size and data alone can achieve. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. (According to OpenAI staff during the recent livestream, o3 used 10× more training compute compared to o1)…
Give a gift subscription
Second, reinforcement learning—prompt engineering—and such are ways of attempting to condition this interpolation process by altering the domain word-sequence in such a way as to carry it into a portion of the training dataset where, when judged by humans, this function (word-strings) → (continuations) does not suck. That is, in some sense, all they are: You have a function trained on internet dreck in which there are some veins of gold—accurate information and useful continuations of word-strings—and you need to transform the word-string you send so that it lands inside one of those veins.
As the wise Cosma Shalizi says:
Cosma Shalizi: On Feral Library Card Catalogs, or, Aware of All Internet Traditions <https://www.programmablemutter.com/cp/161552401>: ‘LLMs are parametric probability models of symbol sequences, fit to large corpora of text by maximum likelihood. By design, their fitting process strives to reproduce the distribution of text in the training corpus…. Prompting is conditioning: the output after a prompt is a sample from the conditional distribution of text coming after the prompt…. All these distributions are estimated with a lot of smoothing… [that tell the probability model when to treat different-looking contexts as similar…. This smoothing… is what lets the models respond with sensible-looking output… to prompts they've never seen in the training corpus. It also, implicitly, tells the model what to ignore, what distinctions make no difference. This is part (though only part) of why these models are lossy…
Get 33% off a group subscription
And these models are likely to be very good and useful where:
Cosma Shalizi: On Feral Library Card Catalogs, or, Aware of All Internet Traditions <https://www.programmablemutter.com/cp/161552401>: ‘A huge amount of cultural and especially intellectual tradition consists of formulas, templates, conventions, and indeed tropes and stereotypes…. Formulas also reduce the cognitive burden on people receiving communications… boiler-plate and ritual, yes, but it's not just boiler-plate and ritual, or at least not pointless ritual…
Leave a comment
But not so good and not so useful for many other tasks.
And now comes the part in which I note that someday this bashing-of-Artificial-Super-Intelligence-grifters-and-boosters may go wrong. I am, after all, making a version of the Searle Chinese Room argument against the case that MAMLMs are brains. I believe that the argument applies here. But Scott Aaronson’s comments on that kind of argument are apposite here:
Scott Aaronson: PHYS771 Lecture 4: Minds and Machines <https://www.scottaaronson.com/democritus/lec4.html>: ‘In the last fifty years, have there been any new insights about the Turing Test?… A non-insight, which is called Searle's Chinese Room…. Let's say you don't speak Chinese…. You sit in a room, and someone passes you paper slips through a hole in the wall with questions written in Chinese, and you're able to answer the questions (again in Chinese) just by consulting a rule book. In this case, you might be carrying out an intelligent Chinese conversation, yet by assumption, you don't understand a word of Chinese!…
Like many other thought experiments, the Chinese Room gets its mileage from a deceptive choice of imagery—and more to the point, from ignoring computational complexity. We're invited to imagine someone pushing around slips of paper with zero understanding or insight—much like the doofus freshmen who write (a+b)2=a2+b2 on their math tests. But how many slips of paper are we talking about?… If each page of the rule book corresponded to one neuron… we'd be talking about a "rule book" at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light…. Maybe it's not so hard to imagine that this enormous Chinese-speaking entity—this dian nao—that we've brought into being might have something we'd be prepared to call understanding or insight…
At what point does rote—or nearly rote—manipulation of symbols and interpolation from examples turn into real thinking? We do not know, on the one hand. On the other hand, I see absolutely no signs anywhere in these systems that they are close.
Share
What, then, is going on with all the people who think that we are close?
My guess: From an evolutionary psychology standpoint, humans are pattern-recognition machines. We are primed to detect agency—even where it doesn’t exist—because assuming intent behind observed behavior was often adaptive. You hear a rustle in the bushes: it’s safer to think “predator” than “wind.”
This cognitive bias leads us to over-ascribe intelligence and intention to things that act in ways that seem purposeful.
In the 20th century, people attributed intelligence to thermostats (“it knows the room is too hot”) and ELIZA-like chatbots <https://en.wikipedia.org/wiki/ELIZA>. Now, we do the same with large language models like ChatGPT. The outputs feel coherent, fluent, and often insightful—so we instinctively project mind onto them. And this is a useful thing to do: recall Daniel Dennett’s “intentional stance”—that we have often found it useful to model complex systems (or simple systems: thermostats!) as having beliefs and desires not because they do but because taking that stance is a very quick and useful way to begin predicting their behavior.
Thus as our technologies increase productivity and complexity we find that this over-ascription generate cognitive dissonance: we struggle to grasp how machines can do what they do without being “like us”, and when a machine completes a task that once required a skilled human—like writing a letter, playing chess, or composing music—people start asking not “how does it work?” but “who is it in there?”
What are the consequence of placing MAMLMs in the wrong box? Of thinking of them as rapidly becoming human-level replacement for human beings, rather than as powerful new complements to human minds offloading a great deal of mechanical repetitive tasks, boilerplate generation, formulae application, and ritual enaction?
The first consequence is that it makes it easy to take a huge amount of money from gullible rich AI enthusiasts.
Subscribe now
Leave a comment
If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…Share
Share Brad DeLong's Grasping Reality
If you start from the premise that a language model like ChatGPT is a very flexible, very high dimensional, very big data regression-and-classification engine best seen as a function from the domain of word-strings to the range of continuation words, I think a large number of things become clear.
First, because its training dataset is sparse in its potential domain—nearly all even moderate-length word-sequences that are not boilerplate or cliché are unique—its task is one of interpolation: take word-sequences “close” to the prompt, examine their continuations, and average them. Thus while pouring more and more resources into the engine does get you, potentially, a finer and finer interpolation, it seems highly likely that this process will have limits rather than grow to the sky, and it is better to look at it as an engine summarizing what humans typically say in analogous linguistic situations rather than any form of “thinking”.
I think the post-ChatGPT3 history of LLMs bears this out:
Sebastian Raschka: The State of Reinforcement Learning for LLM Reasoning <https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training>: ‘Releases of new flagship models like GPT-4.5 and Llama 4…. Reactions to these releases were relatively muted…. The muted response… suggests we are approaching the limits of what scaling model size and data alone can achieve. However, OpenAI's recent release of the o3 reasoning model demonstrates there is still considerable room for improvement when investing compute strategically, specifically via reinforcement learning methods tailored for reasoning tasks. (According to OpenAI staff during the recent livestream, o3 used 10× more training compute compared to o1)…
Give a gift subscription
Second, reinforcement learning—prompt engineering—and such are ways of attempting to condition this interpolation process by altering the domain word-sequence in such a way as to carry it into a portion of the training dataset where, when judged by humans, this function (word-strings) → (continuations) does not suck. That is, in some sense, all they are: You have a function trained on internet dreck in which there are some veins of gold—accurate information and useful continuations of word-strings—and you need to transform the word-string you send so that it lands inside one of those veins.
As the wise Cosma Shalizi says:
Cosma Shalizi: On Feral Library Card Catalogs, or, Aware of All Internet Traditions <https://www.programmablemutter.com/cp/161552401>: ‘LLMs are parametric probability models of symbol sequences, fit to large corpora of text by maximum likelihood. By design, their fitting process strives to reproduce the distribution of text in the training corpus…. Prompting is conditioning: the output after a prompt is a sample from the conditional distribution of text coming after the prompt…. All these distributions are estimated with a lot of smoothing… [that tell the probability model when to treat different-looking contexts as similar…. This smoothing… is what lets the models respond with sensible-looking output… to prompts they've never seen in the training corpus. It also, implicitly, tells the model what to ignore, what distinctions make no difference. This is part (though only part) of why these models are lossy…
Get 33% off a group subscription
And these models are likely to be very good and useful where:
Cosma Shalizi: On Feral Library Card Catalogs, or, Aware of All Internet Traditions <https://www.programmablemutter.com/cp/161552401>: ‘A huge amount of cultural and especially intellectual tradition consists of formulas, templates, conventions, and indeed tropes and stereotypes…. Formulas also reduce the cognitive burden on people receiving communications… boiler-plate and ritual, yes, but it's not just boiler-plate and ritual, or at least not pointless ritual…
Leave a comment
But not so good and not so useful for many other tasks.
And now comes the part in which I note that someday this bashing-of-Artificial-Super-Intelligence-grifters-and-boosters may go wrong. I am, after all, making a version of the Searle Chinese Room argument against the case that MAMLMs are brains. I believe that the argument applies here. But Scott Aaronson’s comments on that kind of argument are apposite here:
Scott Aaronson: PHYS771 Lecture 4: Minds and Machines <https://www.scottaaronson.com/democritus/lec4.html>: ‘In the last fifty years, have there been any new insights about the Turing Test?… A non-insight, which is called Searle's Chinese Room…. Let's say you don't speak Chinese…. You sit in a room, and someone passes you paper slips through a hole in the wall with questions written in Chinese, and you're able to answer the questions (again in Chinese) just by consulting a rule book. In this case, you might be carrying out an intelligent Chinese conversation, yet by assumption, you don't understand a word of Chinese!…
Like many other thought experiments, the Chinese Room gets its mileage from a deceptive choice of imagery—and more to the point, from ignoring computational complexity. We're invited to imagine someone pushing around slips of paper with zero understanding or insight—much like the doofus freshmen who write (a+b)2=a2+b2 on their math tests. But how many slips of paper are we talking about?… If each page of the rule book corresponded to one neuron… we'd be talking about a "rule book" at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light…. Maybe it's not so hard to imagine that this enormous Chinese-speaking entity—this dian nao—that we've brought into being might have something we'd be prepared to call understanding or insight…
At what point does rote—or nearly rote—manipulation of symbols and interpolation from examples turn into real thinking? We do not know, on the one hand. On the other hand, I see absolutely no signs anywhere in these systems that they are close.
Share
What, then, is going on with all the people who think that we are close?
My guess: From an evolutionary psychology standpoint, humans are pattern-recognition machines. We are primed to detect agency—even where it doesn’t exist—because assuming intent behind observed behavior was often adaptive. You hear a rustle in the bushes: it’s safer to think “predator” than “wind.”
This cognitive bias leads us to over-ascribe intelligence and intention to things that act in ways that seem purposeful.
In the 20th century, people attributed intelligence to thermostats (“it knows the room is too hot”) and ELIZA-like chatbots <https://en.wikipedia.org/wiki/ELIZA>. Now, we do the same with large language models like ChatGPT. The outputs feel coherent, fluent, and often insightful—so we instinctively project mind onto them. And this is a useful thing to do: recall Daniel Dennett’s “intentional stance”—that we have often found it useful to model complex systems (or simple systems: thermostats!) as having beliefs and desires not because they do but because taking that stance is a very quick and useful way to begin predicting their behavior.
Thus as our technologies increase productivity and complexity we find that this over-ascription generate cognitive dissonance: we struggle to grasp how machines can do what they do without being “like us”, and when a machine completes a task that once required a skilled human—like writing a letter, playing chess, or composing music—people start asking not “how does it work?” but “who is it in there?”
What are the consequence of placing MAMLMs in the wrong box? Of thinking of them as rapidly becoming human-level replacement for human beings, rather than as powerful new complements to human minds offloading a great deal of mechanical repetitive tasks, boilerplate generation, formulae application, and ritual enaction?
The first consequence is that it makes it easy to take a huge amount of money from gullible rich AI enthusiasts.
Subscribe now
Leave a comment
If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…