The Nonlinear Library: Alignment Forum

AF - Recreating the caring drive by Catnee


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Recreating the caring drive, published by Catnee on September 7, 2023 on The AI Alignment Forum.
TL;DR: This post is about value of recreating "caring drive" similar to some animals and why it might be useful for AI Alignment field in general. Finding and understanding the right combination of training data/loss function/architecture/etc that allows gradient descent to robustly find/create agents that will care about other agents with different goals could be very useful for understanding the bigger problem. While it's neither perfect nor universally present, if we can understand, replicate, and modify this behavior in AI systems, it could provide a hint to the alignment solution where the AGI "cares" for humans.
Disclaimers: I'm not saying that "we can raise AI like a child to make it friendly" or that "people are aligned to evolution". Both of these claims I find to be obvious errors. Also, I will write a lot about evolution, as some agentic entity, that "will do that or this", not because I think that it's agentic, but because it's easier to write this way. I think that GPT-4 have some form of world model, and will refer to it a couple of times.
Nature's Example of a "Caring Drive"
Certain animals, notably humans, display a strong urge to care for their offspring.
I think that part of one of the possible "alignment solutions" will look like the right set of training data + training loss that allow gradient to robustly find something like a "caring drive" that we can then study, recreate and repurpose for ourselves. And I think we have some rare examples of this in nature already. Some animals, especially humans, will kind-of-align themselves to their presumable offspring. They will want to make their life easier and better, to the best of their capabilities and knowledge. Not because they "aligned to evolution" and want to increase the frequency of their genes, but because of some strange internal drive created by evolution. The set of triggers tuned by evolution, activated by events associated with the birth will awake the mechanism.
It will re-aim the more powerful mother agent to be aligned to the less powerful baby agent, and it just so happens that their babies will give them the right cues and will be nearby when the mechanism will do its work. We will call the more powerful initial agent that changes its behavior and tries to protect and help its offspring "mother" and the less powerful and helpless agent "baby". Of course the mechanism isn't ideal, but it works well enough, even in the modern world, far outside of initial evolutionary environment. And I'm not talking about humans only, stray urban animals that live in our cities will still adapt their "caring procedures" to this completely new environment, without several rounds of evolutionary pressure. If we can understand how to make this mechanism for something like a "cat-level" AI, by finding it via gradient descend and then rebuild it from scratch, maybe we will gain some insides into the bigger problem.
The rare and complex nature of the caring drive in contrast to simpler drives like hunger or sleep.
What do I mean by "caring drive"? Animals, including humans, have a lot of competing motivations, "want drives", they want to eat, sleep, have sex, etc. It seems that the same applies to caring about babies. But it seems to be much more complicated set of behaviors. You need to:correctly identify your baby, track its position, protect it from outside dangers, protect it from itself, by predicting the actions of the baby in advance to stop it from certain injury, trying to understand its needs to correctly fulfill them, since you don't have direct access to its internal thoughts etc.Compared to "wanting to sleep if active too long" or "wanting to eat when blood sugar level is low" I would confidently s...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners