November 07, 2023

AF - Box inversion revisited by Jan Kulveit

13 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Box inversion revisited, published by Jan Kulveit on November 7, 2023 on The AI Alignment Forum.

Box inversion hypothesis

is a proposed correspondence between problems with AI systems studied in approaches like

agent foundations

, and problems with AI ecosystems, studied in various views on AI safety expecting multipolar, complex worlds, like

CAIS.

This is an updated and improved introduction to the idea.

Cartoon explanation

In the classic -"superintelligence in a box" - picture, we worry about an increasingly powerful AGI, which we imagine as contained in a box. Metaphorically, we worry that the box will, at some point, just blow up in our faces. Classic arguments about AGI then proceed by showing it is really hard to build AGI-proof boxes, and that really strong optimization power is dangerous by default. While the basic view was largely conceived by Eliezer Yudkowsky and Nick Bostrom, it is still the view most technical AI safety is built on, including current agendas like mechanistic interpretability and evals.

In the less famous, though also classic, picture, we worry about an increasingly powerful ecosystem of AI services, automated corporations, etc. Metaphorically, we worry about the ever-increasing optimization pressure "out there", gradually marginalizing people, and ultimately crushing us. Classical treatments of this picture are less famous, but include Eric Drexler's CAIS (

Comprehensive AI Services

) and Scott Alexander's

Ascended Economy

. We can imagine scenarios like the human-incomprehensible economy expanding in the universe, and humans and our values being protected by some sort of "box". Agendas based on this view include

the work of the AI Objectives Institute

and part of ACS work.

The apparent disagreement between these views was sometimes seen as a crux for various AI safety initiatives.

"Box inversion hypothesis" claims:

The two pictures to a large degree depict the same or a very similar situation,

Are related by a transformation which "turns the box inside out", similarly to a geometrical transformation of a plane known as a circle inversion,

and: this metaphor is surprisingly deep and can point to hard parts of some problems.

Geometrical metaphor

Circular inversion

" transformation does not imply the original and the inverted objects are the same, or are located at the same places. What it does imply is that some relations between objects are preserved: for example, if some objects intersect, in the circle-inverted view, they will still intersect.

Similarly for "box inversion" : the hypothesis does not claim that the AI safety problems in both views are identical, but it does claim that, for most problems, there is a corresponding problem described by the other perspective. Also, while the box-inverted problems may at a surface level look very different, and be located in different places, there will be some deep similarity between the two corresponding problems.

In other words, the box inversion hypothesis suggests that there is a kind of 'mirror image' or 'duality' between two sets of AI safety problems. One set comes from the "Agent Foundations" type of perspective, and the other set comes from the "Ecosystems of AIs" type of perspective.

Box-inverted problems

Problems with ontologies and regulatory frameworks

[1]

In the classic agent foundations-esque picture, a nontrivial fraction of AI safety challenges are related to issues of similarity, identification, and development of ontologies.

Roughly speaking

If the AI is using utterly non-human concepts and world models, it becomes much more difficult to steer and control

If "what humans want" is expressed in human concepts, and the concepts don't extend to novel situations or contexts, then it is unclear how the AI should extend or interpret the human "wants"

Even if an AI

initially

u...

...more

View all episodes

By The Nonlinear Fund

November 07, 2023

AF - Box inversion revisited by Jan Kulveit

13 minutes

Box inversion hypothesis

is a proposed correspondence between problems with AI systems studied in approaches like

agent foundations

, and problems with AI ecosystems, studied in various views on AI safety expecting multipolar, complex worlds, like

CAIS.

This is an updated and improved introduction to the idea.

Cartoon explanation

Comprehensive AI Services

) and Scott Alexander's

Ascended Economy

. We can imagine scenarios like the human-incomprehensible economy expanding in the universe, and humans and our values being protected by some sort of "box". Agendas based on this view include

the work of the AI Objectives Institute

and part of ACS work.

The apparent disagreement between these views was sometimes seen as a crux for various AI safety initiatives.

"Box inversion hypothesis" claims:

The two pictures to a large degree depict the same or a very similar situation,

Are related by a transformation which "turns the box inside out", similarly to a geometrical transformation of a plane known as a circle inversion,

and: this metaphor is surprisingly deep and can point to hard parts of some problems.

Geometrical metaphor

Circular inversion

Box-inverted problems

Problems with ontologies and regulatory frameworks

[1]

In the classic agent foundations-esque picture, a nontrivial fraction of AI safety challenges are related to issues of similarity, identification, and development of ontologies.

Roughly speaking

If the AI is using utterly non-human concepts and world models, it becomes much more difficult to steer and control

If "what humans want" is expressed in human concepts, and the concepts don't extend to novel situations or contexts, then it is unclear how the AI should extend or interpret the human "wants"

Even if an AI

initially

u...

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Box inversion revisited by Jan Kulveit

Sign up to save your podcasts

AF - Box inversion revisited by Jan Kulveit

AF - Box inversion revisited by Jan Kulveit

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast