The Nonlinear Library: Alignment Forum

AF - Responses to apparent rationalist confusions about game / decision theory by Anthony DiGiovanni


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Responses to apparent rationalist confusions about game / decision theory, published by Anthony DiGiovanni on August 30, 2023 on The AI Alignment Forum.
I've encountered various claims about how AIs would approach game theory and decision theory that seem pretty importantly mistaken. Some of these confusions probably aren't that big a deal on their own, and I'm definitely not the first to point out several of these, even publicly. But collectively I think these add up to a common worldview that underestimates the value of technical work to reduce risks of AGI conflict. I expect that smart agents will likely avoid catastrophic conflict overall - it's just that the specific arguments for expecting this that I'm responding to here aren't compelling (and seem overconfident).
For each section, I include in the footnotes some examples of the claims I'm pushing back on (or note whether I've primarily seen these claims in personal communication). This is not to call out those particular authors; in each case, they're saying something that seems to be a relatively common meme in this community.
Summary:
The fact that conflict is costly for all the agents involved in the conflict, ex post, doesn't itself imply AGIs won't end up in conflict. Under their uncertainty about each other, agents with sufficiently extreme preferences or priors might find the risk of conflict worth it ex ante. (more)
Solutions to collective action problems, where agents agree on a Pareto-optimal outcome they'd take if they coordinated to do so, don't necessarily solve bargaining problems, where agents may insist on different Pareto-optimal outcomes. (more)
We don't have strong reasons to expect AGIs to converge on sufficiently similar decision procedures for bargaining, such that they coordinate on fair demands despite committing under uncertainty. Existing proposals for mitigating conflict given incompatible demands, while promising, face some problems with incentives and commitment credibility. (more)
The commitment races problem is not just about AIs making commitments that fail to account for basic contingencies. Updatelessness (or conditional commitments generally) seems to solve the latter, but it doesn't remove agents' incentives to limit how much their decisions depend on each other's decisions (leading to incompatible demands). (more)
AIs don't need to follow acausal decision theories in order to (causally) cooperate via conditioning on each other's source code. (more)
Most supposed examples of Newcomblike problems in everyday life don't seem to actually be Newcomblike, once we account for "screening off" by certain information, per the Tickle Defense. (more)
The fact that following acausal decision theories maximizes expected utility with respect to conditional probabilities, or counterfactuals with the possibility of logical causation, doesn't imply that agents with acausal decision theories are selected for (e.g., acquire more material resources). (more)
Ex post optimal =/= ex ante optimal
An "ex post optimal" strategy is one that in fact makes an agent better off than the alternatives, while an "ex ante optimal" strategy is optimal with respect to the agent's uncertainty at the time they choose that strategy. The idea that very smart AGIs could get into conflicts seems intuitively implausible because conflict is, by definition, ex post Pareto-suboptimal. (See the "inefficiency puzzle of war.")
But it doesn't follow that the best strategies available to AGIs given their uncertainty about each other will always be ex post Pareto-optimal. This may sound obvious, but my experience with seeing people's reactions to the problem of AGI conflict suggests that many of them haven't accounted for this important distinction.
As this post discusses in more detail, there are two fundamental sources of u...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners