Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Quick thoughts on the implications of multi-agent views of mind on AI takeover, published by Kaj Sotala on December 11, 2023 on The AI Alignment Forum.
There was conversation on Facebook over an argument that any sufficiently complex system, whether a human, a society, or an AGI, will be unable to pursue a unified goal due to internal conflict among its parts, and that this should make us less worried about "one paperclipper"-style AI FOOM scenarios. Here's a somewhat edited and expanded version of my response:
1) yes this is a very real issue
2) yet as others pointed out, humans and organizations are still able to largely act as if they had unified goals, even if they often also act contrary to those goals
3) there's a lot of variance in how unified any given human is. trauma makes you less unified, while practices such as therapy and certain flavors of meditation can make a person significantly more unified than they used to be. if you were intentionally designing a mind, you could create mechanisms that artificially mimicked the results of these practices
4) a lot of the human inconsistency looks like it has actually been evolutionarily adaptive for social purposes. E.g. if your social environment punishes you for having a particular trait or belief, then it's adaptive to suppress that to avoid punishment, while also retaining a desire to still express it when you can get away with it. This then manifests as what could be seen as conflicting sub-agents, with internal conflict and inconsistent behaviour.
6) at the same time there is still genuinely the angle about complexity and unpredictability making it hard to get a complex mind to work coherently and internally aligned. I think that evolution has done a lot of trial and error to set up parameters that result in brain configurations where people end up acting in a relatively sensible way - and even after all that trial and error, lots of people today still end up with serious mental illnesses, failing to achieve almost any of the goals they have (even when the environment isn't stacked against them), dying young due to doing something that they predictably shouldn't have, etc.
7) aligning an AI's sub-agents in a purely simulated environment may not be fully feasible because a lot of the questions that need to be solved are things like "how much priority to allocate to which sub-agent in which situation". E.g. humans come with lots of biological settings that shift the internal balance of sub-agents when hungry, tired, scared,, etc.
Some people develop an obsessive focus on a particular topic which may end up being beneficial if they are lucky (obsession on programming that you can turn into a career), or harmful if they are unlucky (an obsession on anything that doesn't earn you money and actively distracts you from it). The optimal prioritization depends on the environment and I don't think there is any theoretically optimal result that would be real-world relevant and that you could calculate beforehand. Rather you just have to do trial-and-error, and while running your AIs in a simulated environment may help a bit, it may not help much if your simulation doesn't sufficiently match the real world.
8) humans are susceptible to internal Goodhart's Law, where they optimize for proxy variables like "sense of control over one's environment", and this also leads them to doing things like playing games or smoking cigarettes to increase their perceived control of the environment without increasing their actual control of the environment.
I think that an AI having the same issue is much more likely than it just being able to single-mindedly optimize for a single goal and derive all of its behavior and subgoals from that. Moreover, evolution has put quite a bit of optimization power into developing the right kinds of pr...