The Nonlinear Library: Alignment Forum

AF - How Would an Utopia-Maximizer Look Like? by Thane Ruthenis


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How Would an Utopia-Maximizer Look Like?, published by Thane Ruthenis on December 20, 2023 on The AI Alignment Forum.
When we talk of aiming for the good future for humanity - whether by aligning AGI or any other way - it's implicit that there are some futures that "humanity" as a whole would judge as good. That in some (perhaps very approximate) sense, humanity could be viewed as an agent with preferences, and that our aim is to satisfy said preferences.
But is there a theoretical basis for this? Could there be? How would it look like?
Is there a meaningful frame in which humanity be viewed as optimizing for its purported preferences across history?
Is it possible or coherent to imagine a wrapper-mind set to the task of maximizing for the utopia, whose activity we'd actually endorse?
This post aims to sketch out answers to these questions. In the process, it also outlines how my current models of basic value reflection and extrapolation work.
Informal Explanation
Basic Case
Is an utopia that'd be perfect for everyone possible?
The short and obvious answer is no. Our civilization contains omnicidal maniacs and true sadists, whose central preferences are directly at odds with the preferences of most other people. Their happiness is diametrically opposed to other people's.
Less extremely, it's likely that most individuals' absolutely perfect world would fail to perfectly satisfy most others. As a safe example, we could imagine someone who loves pizza, yet really, really hates seafood, to such an extent that they're offended by the mere knowledge that seafood exists somewhere in the world. Their utopia would not have any seafood anywhere - and that would greatly disappoint seafood-lovers. If we now postulate the existence of a pizza-hating seafood-lover...
Nevertheless, there are worlds that would make both of them happy enough. A world in which everyone is free to eat food that's tasty according to their preferences, and is never forced to interact with the food they hate. Both people would still dislike the fact that their hated dishes exist somewhere. But as long as food-hating is not their core value that's dominating their entire personality, they'd end up happy enough.
Similarly, it intuitively feels that worlds which are strictly better according to most people's entire arrays of preferences are possible. Empowerment is one way to gesture at it - a world in which each individual is simply given more instrumental resources, a greater ability to satisfy whatever preferences they happen to have. (With some limitations on impacting other people, etc.)
But is it possible to arrive at this idea from first principles? By looking at humanity and somehow "eliciting"/"agglomerating" its preferences formally? A process like CEV? A target to hit that's "objectively correct" according to humanity's own subjective values, rather than your subjective interpretation of its values?
Paraphrasing, we're looking for an utility function such that the world-state maximizing it is ranked as very high by the standards of most humans' preferences; an utility function that's correlated with the "agglomeration" of most humans' preferences.
Let's consider what we did in the foods example. We discovered two disparate preferences, and then we abstracted up from them - from concrete ideas like "seafood" and "pizza", to an abstraction over them: food-in-general. And we've discover that, although the individuals' preferences disagreed on the concrete level, they ended up basically the same at the higher level. Trivializing, it turned out that a seafood-optimizer and a pizza-optimizer could both be viewed as tasty-food-optimizers.
The hypothesis, then, would go as follows: at some very high abstraction level, the level of global matters and fundamental philosophy, most humans' preferences converg...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners