The Nonlinear Library

AF - Why do we care about agency for alignment? by Chris Leong


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why do we care about agency for alignment?, published by Chris Leong on April 23, 2023 on The AI Alignment Forum.
Many people believe that understanding "agency" is crucial for alignment, but as far as I know, there isn't a canonical list of reasons why we care about agency. Please describe any reasons why we might care about the concept of agency for understanding alignment below. If you have multiple reasons, please list them in separate answers below.
Please also try to be specific as possible about what our goal is in the scenario. For example, instead of just saying, "We want to know what an agent is so that we can determine whether or not a given AI is a dangerous agent" it might be better to say something along the lines of "We have an AI which may or may not have goals aligned with us and we want to know how dangerous it would be if it weren't aligned. We want to use interpretability tools to determine the extent to which it will pursue instrumental incentives and this would be easier if we knew exactly what we were looking for. In particular, we can set up a dichotomy between agents which have only learned to pursue instrumental incentives based upon a few heuristics it learned during training and an agent which maximises its goals during deployment. We would like the ability to figure out where a particular agent is on this dichotomy".
In a few days, I'll add any use cases I'm aware of myself that either haven't been covered or that I don't think have been adequately explained by different answers.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings