The Nonlinear Library: Alignment Forum

AF - Free agents by Michele Campolo


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Free agents, published by Michele Campolo on December 27, 2023 on The AI Alignment Forum.
Posted also on the EA Forum.
Shameless attempt at getting your attention:
If you've heard of AI alignment before, this might change your perspective on it. If you come from the field of machine ethics or philosophy, this is about how to create an independent moral agent.
Introduction
The problem of creating an AI that understands human values is often split into two parts: first, expressing human values in a machine-digestible format, or making the AI infer them from human data and behaviour; and second, ensuring the AI correctly interprets and follows these values.
In this post I propose a different approach, closer to how human beings form their moral beliefs. I present a design of an agent that resembles an independent thinker instead of an obedient servant, and argue that this approach is a viable, possibly better, alternative to the aforementioned split.
I've structured the post in a main body, asserting the key points while trying to remain concise, and an appendix, which first expands sections of the main body and then discusses some related work. Although it ended up in the appendix, I think the extended Motivation section is well worth reading if you find the main body interesting.
Without further ado, some more ado first.
A brief note on style and target audience
This post contains a tiny amount of mathematical formalism, which should improve readability for maths-oriented people. Here, the purpose of the formalism is to reduce some of the ambiguities that normally arise with the use of natural language, not to prove fancy theorems. As a result, the post should be readable by pretty much anyone who has some background knowledge in AI, machine ethics, or AI alignment - from software engineers to philosophers and AI enthusiasts (or doomers).
If you are not a maths person, you won't lose much by skipping the maths here and there: I tried to write sentences in such a way that they keep their structure and remain sensible even if all the mathematical symbols are removed from the document. However, this doesn't mean that the content is easy to digest; at some points you might have to stay focused and keep in mind various things at the same time in order to follow.
Motivation
The main purpose of this research is to enable the engineering of an agent which understands good and bad and whose actions are guided by its understanding of good and bad.
I've already given some reasons
elsewhere why I think this research goal is worth pursuing. The appendix, under Motivation, contains more information on this topic and on moral agents.
Here I point out that agents which just optimise a metric given by the designer (be it reward, loss, or a utility function) are not fit to the research goal. First, any agent that limits itself to executing instructions given by someone else can hardly be said to have an understanding of good and bad. Second, even if the given instructions were in the form of rules that the designer recognised as moral - such as "Do not harm any human" - and the agent was able to follow them perfectly, then the agent's behaviour would still be grounded in the designer's understanding of good and bad, rather than in the agent's own understanding.
This observation leads to an agent design different from the usual fixed-metric optimisation found in the AI literature (loss minimisation in neural networks is a typical example). I present the design in the next section.
Note that I give neither executable code nor a fully specified blueprint; instead, I just describe the key properties of a possibly broad class of agents. Nonetheless, this post should contain enough information that AI engineers and research scientists reading it could gather at least some ideas on how to cre...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners