February 27, 2024

AF - Counting arguments provide no evidence for AI doom by Nora Belrose

27 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Counting arguments provide no evidence for AI doom, published by Nora Belrose on February 27, 2024 on The AI Alignment Forum.

Crossposted from the AI Optimists blog.

AI doom scenarios often suppose that future AIs will engage in scheming - planning to escape, gain power, and pursue ulterior motives, while deceiving us into thinking they are aligned with our interests. The worry is that if a schemer escapes, it may seek world domination to ensure humans do not interfere with its plans, whatever they may be.

In this essay, we debunk the counting argument - a central reason to think AIs might become schemers, according to a recent report by AI safety researcher Joe Carlsmith.[1] It's premised on the idea that schemers can have "a wide variety of goals," while the motivations of a non-schemer must be benign by definition. Since there are "more" possible schemers than non-schemers, the argument goes, we should expect training to produce schemers most of the time. In Carlsmith's words:

The non-schemer model classes, here, require fairly specific goals in order to get high reward.

By contrast, the schemer model class is compatible with a very wide range of (beyond episode) goals, while still getting high reward…

In this sense, there are "more" schemers that get high reward than there are non-schemers that do so.

So, other things equal, we should expect SGD to select a schemer.

Scheming AIs, page 17

We begin our critique by presenting a structurally identical counting argument for the obviously false conclusion that neural networks should always memorize their training data, while failing to generalize to unseen data. Since the premises of this parody argument are actually stronger than those of the original counting argument, this shows that counting arguments are generally unsound in this domain.

We then diagnose the problem with both counting arguments: they rest on an incorrect application of the principle of indifference, which says that we should assign equal probability to each possible outcome of a random process. The indifference principle is controversial, and is known to yield absurd and paradoxical results in many cases.

We argue that the principle is invalid in general, and show that the most plausible way of resolving its paradoxes also rules out its application to an AI's behaviors and goals.

More generally, we find that almost all arguments for taking scheming seriously depend on unsound indifference reasoning. Once we reject the indifference principle, there is very little reason left to worry that future AIs will become schemers.

The counting argument for overfitting

Counting arguments often yield absurd conclusions. For example:

Neural networks must implement fairly specific functions in order to generalize beyond their training data.

By contrast, networks that overfit to the training set are free to do almost anything on unseen data points.

In this sense, there are "more" models that overfit than models that generalize.

So, other things equal, we should expect SGD to select a model that overfits.

This isn't a merely hypothetical argument. Prior to the rise of deep learning, it was commonly assumed that models with more parameters than data points would be doomed to overfit their training data. The popular 2006 textbook

Pattern Recognition and Machine Learning uses a simple example from polynomial regression: there are infinitely many polynomials of order equal to or greater than the number of data points which interpolate the training data perfectly, and "almost all" such polynomials are terrible at extrapolating to unseen points.

Let's see what the overfitting argument predicts in a simple real-world example from

Caballero et al. (2022), where a neural network is trained to solve 4-digit addition problems.

There are 10,0002 = 100,000,000 possible pairs o...

...more

View all episodes

By The Nonlinear Fund

February 27, 2024

AF - Counting arguments provide no evidence for AI doom by Nora Belrose

27 minutes

Crossposted from the AI Optimists blog.

The non-schemer model classes, here, require fairly specific goals in order to get high reward.

By contrast, the schemer model class is compatible with a very wide range of (beyond episode) goals, while still getting high reward…

In this sense, there are "more" schemers that get high reward than there are non-schemers that do so.

So, other things equal, we should expect SGD to select a schemer.

Scheming AIs, page 17

We argue that the principle is invalid in general, and show that the most plausible way of resolving its paradoxes also rules out its application to an AI's behaviors and goals.

The counting argument for overfitting

Counting arguments often yield absurd conclusions. For example:

Neural networks must implement fairly specific functions in order to generalize beyond their training data.

By contrast, networks that overfit to the training set are free to do almost anything on unseen data points.

In this sense, there are "more" models that overfit than models that generalize.

So, other things equal, we should expect SGD to select a model that overfits.

Let's see what the overfitting argument predicts in a simple real-world example from

Caballero et al. (2022), where a neural network is trained to solve 4-digit addition problems.

There are 10,0002 = 100,000,000 possible pairs o...

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Counting arguments provide no evidence for AI doom by Nora Belrose

Sign up to save your podcasts

AF - Counting arguments provide no evidence for AI doom by Nora Belrose

AF - Counting arguments provide no evidence for AI doom by Nora Belrose

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast