October 02, 2023

AF - Direction of Fit by Nicholas Kees Dupuis

5 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Direction of Fit, published by Nicholas Kees Dupuis on October 2, 2023 on The AI Alignment Forum.

This concept has recently become a core part of my toolkit for thinking about the world, and I find it helps explain a lot of things that previously felt confusing to me. Here I explain how I understand "direction of fit," and give some examples of where I find the concept can be useful.

Handshake Robot

A friend recently returned from an artificial life conference and told me about a robot which was designed to perform a handshake. It was given a prior about handshakes, or how it expected a handshake to be. When it shook a person's hand, it then updated this prior, and the degree to which the robot would update its prior was determined by a single parameter. If the parameter was set low, the robot would refuse to update, and the handshake would be firm and forceful. If the parameter was set high, the robot would completely update, and the handshake would be passive and weak.

This parameter determines the direction of fit: whether the object in its mind will adapt to match the world, or whether the robot will adapt the world to match the object in its mind. This concept is often used in philosophy of mind to distinguish between a belief, which has a mind-to-world direction of fit, and a desire, which has a world-to-mind direction of fit. In this frame, beliefs and desires are both of a similar type: they both describe ways the world could be. The practical differences only emerge through how they end up interacting with the outside world.

Many objects seem not to be perfectly separable into one of these two categories, and rather appear to exist somewhere on the spectrum. For example:

An instrumental goal can simultaneously be a belief about the world (that achieving the goal will help fulfill some desire) as well as behaving like a desired state of the world in its own right.

Strongly held beliefs (e.g. religious beliefs) are on the surface ideas which are fit to the world, but in practice behave much more like desires, as people make the world around them fit their beliefs.

You can change your mind about what you desire. For example you may dislike something at first, but after repeated exposure you may come to feel neutral about it, or even actively like it (e.g. the taste of certain foods).

Furthermore, the direction of fit might be context dependent (e.g. political beliefs), beliefs could be self fulfilling (e.g. believing that a presentation will go well could make it go well), and many beliefs or desires could refer to other beliefs or desires (wanting to believe, believing that you want, etc.).

Idealized Rational Agents

The concept of a rational agent, in this frame, is a system which cleanly distinguishes between these two directions of fit, between objects which describe how the world actually is, and objects which prescribe how the world "should" be.

This particular concept of a rational agent can itself have a varying direction of fit. You might describe a system as a rational agent to help your expectations match your observations, but the idea might also prescribe that you should develop this clean split between belief and value.

When talking about AI systems, we might be interested in the behavior of systems where this distinction is especially clear. We might observe that many current AI systems are not well described in this way, or we could speculate about pressures which might lead them toward this kind of split.

Note that this is very different from talking about VNM-rationality, which starts by assuming this clean split, and instead demonstrates why we might expect the different parts of the value model to become coherent and avoid getting in each other's way. The direction-of-fit frame highlights a separate (but equally important) question of whether...

...more

View all episodes

By The Nonlinear Fund

October 02, 2023

AF - Direction of Fit by Nicholas Kees Dupuis

5 minutes

Handshake Robot

Many objects seem not to be perfectly separable into one of these two categories, and rather appear to exist somewhere on the spectrum. For example:

An instrumental goal can simultaneously be a belief about the world (that achieving the goal will help fulfill some desire) as well as behaving like a desired state of the world in its own right.

Idealized Rational Agents

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Direction of Fit by Nicholas Kees Dupuis

Sign up to save your podcasts

AF - Direction of Fit by Nicholas Kees Dupuis

AF - Direction of Fit by Nicholas Kees Dupuis

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast