The Nonlinear Library

LW - Empathy as a natural consequence of learnt reward models by beren


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Empathy as a natural consequence of learnt reward models, published by beren on February 4, 2023 on LessWrong.
Epistemic Status: Pretty speculative but built on scientific literature. This post builds off my previous post on learnt reward models. Crossposted from my personal blog.
Empathy, the ability to feel another's pain or to 'put yourself in their shoes' is often considered to be a fundamental human cognitive ability, and one that undergirds our social abilities and moral intuitions. As so much of human's success and dominance as a species comes down to our superior social organization, empathy has played a vital role in our history. Whether we can build artificial empathy into AI systems also has clear relevance to AI alignment. If we can create empathic AIs, then it may become easier to make an AI be receptive to human values, even if humans can no longer completely control it. Such an AI seems unlikely to just callously wipe out all humans to make a few more paperclips. Empathy is not a silver bullet however. Although (most) humans have empathy, human history is still in large part a history of us waging war against each other, and there are plenty of examples of humans and other animals perpetuating terrible cruelty on enemies and outgroups.
A reasonable literature has grown up in psychology, cognitive science, and neuroscience studying the neural bases of empathy and its associated cognitive processes. We now know a fair amount about the brain regions involved in empathy, what kind of tasks can reliably elicit it, how individual differences in empathy work, as well as the neuroscience underlying disorders such as psychopathy, autism, and alexithmia which result in impaired empathic processing. However, much of this research does not grapple with the fundamental question of why we possess empathy at all. Typically, it seems to be tacitly assumed that, due to its apparent complexity, empathy must be some special cognitive module which has evolved separately and deliberately due to its fitness benefits. From an evolutionary theory perspective, empathy is often assumed to have evolved because of its adaptive function in promoting reciprocal altruism.
The story goes that animals that are altruistic, at least in certain cases, tend to get their altruism reciprocated and may thus tend to out-reproduce other animals that are purely selfish. This would be of especial importance in social species where being able to form coalitions of likeminded and reciprocating indivudals is key to obtaining power and hence reproductive opportunities. If they could, such coalitions would obviously not include purely selfish animals who never reciprocated any benefits they received from other group members. Nobody wants to be in a coalition with an obviously selfish freerider.
Here, I want to argue a different case. Namely that the basic cognitive phenomenon of empathy -- that of feeling and responding to the emotions of others as if they were your own, is not a special cognitive ability which had to be evolved for its social benefit, but instead is a natural consequence of our (mammalian) cognitive architecture and therefore arises by default. Of course, given this base empathic capability, evolution can expand, develop, and contextualize our natural empathic responses to improve fitness. In many cases, however, evolution actually reduces our native empathic capacity -- for instance, we can contextualize our natural empathy to exclude outgroup members and rivals.
The idea is that empathy fundamentally arises from using learnt reward models to mediate between a low-dimensional set of primary rewards and reinforcers and the high dimensional latent state of an unsupervised world model. In the brain, much of the cortex is thought to be randomly initialized and implements a general purp...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings