The Nonlinear Library: Alignment Forum

AF - A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans by Thane Ruthenis


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans, published by Thane Ruthenis on December 17, 2023 on The AI Alignment Forum.
Consider a multipolar-AGI scenario. The hard-takeoff assumption turns out to be wrong, and none of the AI Labs have a significant lead on the others. We find ourselves in a world in which there's a lot of roughly-similarly-capable AGIs. Or perhaps one of the labs does have a lead, but they deliberately instantiate several AGIs simultaneously, as part of a galaxy-brained alignment strategy.
Regardless. Suppose that the worries about these AGIs' internal alignment haven't been properly settled, so we're looking for additional guarantees. We know that they'll soon advance to superintelligences/ASIs, beyond our ability to easily oversee or out-plot. What can we do?
An idea sometimes floated around is to play them off against each other. If they're misaligned from humanity, they're likely mutually misaligned as well. We could put them in game-theoretic situations in which they're incentivized to defect against each other and instead cooperate with humans.
Various supervision setups are most obvious. Sure, if an ASI is supervising another ASI, they would be able to conspire together. But why would they? They have no loyalty to each other either! And if we place them in a lot of situations where they must defect against someone - well, even if we leave it completely to chance, in half the scenarios that might end up humanity! And much more often if we stack the deck in our favour.
And so, although we'll have a whole bunch of superhuman intelligences floating around, we'll retain some control over the situation, and skim a ton of value off the top!
Yeah, no.
1. The Classical Arguments
The usual counter-arguments to this view are acausal coordination based on logical decision theories, and AIs establishing mutual trust by inspecting each other's code. I think those are plausible enough... but also totally unnecessary.
Allow me to outline them first - for completeness' sake, and also because they're illustrative (but extreme) instances of my larger point. (I guess skip to Section 2 onwards if you really can't stand them. I think I'm arguing them more plainly than they're usually argued, though.)
1. The LDT stuff goes as follows: By definition, inasmuch as the ASIs would be superintelligent, they would adopt better reasoning procedures. And every biased thinker is biased in their own way, but quality thinkers would reason in increasingly similar ways.
Why? It's inherent in the structure of the world.
Reasoning algorithms' purpose is to aid decision-making. For a given combination of object-level situation + goals, there's a correct action to take to achieve your goals with the highest probability. To an omniscient observer, that action would be obvious.
As such, making decisions isn't really a matter of choice: it's a matter of prediction. Inasmuch as you improve your decision-making, then, you'd be tweaking your cognitive algorithms to output increasingly more accurate, true-to-reality, probability distributions over which actions would best advance your goals.
And there's only one ground truth. Consequently, no matter their starting points, each ASI would converge towards similar cognition (and, in the limit, likely equivalent cognition).
Thus, as a direct by-product of ASIs being better reasoners than humans, their cognition would be more similar to each other. Which, in turn, would let a given ASI better predict what any other ASI would be thinking and doing, compared to a human trying to predict another human or an ASI. The same way you'd be better able to predict how your identical copy would act, compared to a stranger.
Indeed, in a sense, by way of sharing the decision-making algorithms, each individual ASI would be able to...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners