The Nonlinear Library: Alignment Forum

AF - FixDT by Abram Demski


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: FixDT, published by Abram Demski on November 30, 2023 on The AI Alignment Forum.
FixDT is not a very new decision theory, but little has been written about it afaict, and it's interesting. So I'm going to write about it.
TJ asked me to write this article to "offset" not engaging with Active Inference more. The name "fixDT" is due to Scott Garrabrant, and stands for "fixed-point decision theory". Ideas here are due to Scott Garrabrant, Sam Eisenstat, me, Daniel Hermann, TJ, Sahil, and Martin Soto, in roughly that priority order; but heavily filtered through my own lens.
This post may provide some useful formalism for thinking about issues raised in The Parable of Predict-O-Matic.
Self-fulfilling prophecies & other spooky map-territory connections.
A common trope is for magic to work only when you believe in it. For example, in Harry Potter, you can only get to the magical train platform 934 if you believe that you can pass through the wall to get there.
A plausible normative-rationality rule, when faced with such problems: if you want the magic to work, you should believe that it will work (and you should not believe it will work, if you want it not to work).
Can we sketch a formal decision theory which handles such problems?
We can't start by imagining that the agent has a prior probability distribution, like we normally would, since the agent would already be stuck -- either it lucked into a prior which believed the magic could work, or, it didn't.
Instead, the "beliefs" of the agent start out as maps from probability distributions to probability distributions. I'll use "P" as the type for probability distributions (little p for a specific probability distribution). So the type of "beliefs", B, is a function type: b:PP (little b for a specific belief). You can think of these as "map-territory connections": b is a (causal?) story about what actually happens, if we believe p. A "normal" prior, where we don't think our beliefs influence the world, would just be a constant function: it always outputs the same p no matter what the input is.
Given a belief b, the agent then somehow settles on a probability distribution p. We can now formalize our rationality criteria:
Epistemic Constraint: The probability distribution p which the agent settles on cannot be self-refuting according to the beliefs. It must be a fixed point of b: a p such that b(p)=p.
Instrumental Constraint: Out of the options allowed by the epistemic constraint, p should be as good as possible; that is, it should maximize expected utility. p:=argmaxp such that b(p)=pEpU
We can also require that b be a continuous function, to guarantee the existence of a fixed point[1], so that the agent is definitely able to satisfy these requirements. This might seem like an arbitrary requirement, from the perspective where b is a story about map-territory connections; why should they be required to be continuous? But remember that b is representing the subjective belief-formation process of the agent, not a true objective story. Continuity can be thought of as a limit to the agent's own self-knowledge.
For example, the self-referential statement X: "p(X)<12" suggests an "objectively true" belief which maps p(X) to 1 if it's below 1/2, and maps it to 0 if it's above or equal to 1/2. But this belief has no fixed-point; an agent with this belief cannot satisfy the epistemic constraint on its rationality. If we require b to be continuous, we can only approximate the "objectively true" belief function, by rapidly but not instantly transitioning from 1 to 0 as p(X) rises from slightly less that 1/2 to slightly more.
These "beliefs" are a lot like "trading strategies" from Garrabrant Induction.
We can also replace the continuity requirement with a Kakutani requirement, to get something more like Paul's self-referential probabili...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners