September 17, 2019

AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong

1 hour 16 minutes

In his Research Agenda v0.9: Synthesizing a human's preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It's up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an "adequate" utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment.

Topics discussed in this episode include:

-The core aspects and ideas of Stuart's research agenda

-Human values being changeable, manipulable, contradictory, and underdefined

-This research agenda in the context of the broader AI alignment landscape

-What the proposed synthesis process looks like

-How to identify human partial preferences

-Why a utility function anyway?

-Idealization and reflective equilibrium

-Open questions and potential problem areas

Here you can find the podcast page: https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/

Important timestamps:

0:00 Introductions

3:24 A story of evolution (inspiring just-so story)

6:30 How does your “inspiring just-so story” help to inform this research agenda?

8:53 The two core parts to the research agenda

10:00 How this research agenda is contextualized in the AI alignment landscape

12:45 The fundamental ideas behind the research project

15:10 What are partial preferences?

17:50 Why reflexive self-consistency isn’t enough

20:05 How are humans contradictory and how does this affect the difficulty of the agenda?

25:30 Why human values being underdefined presents the greatest challenge

33:55 Expanding on the synthesis process

35:20 How to extract the partial preferences of the person

36:50 Why a utility function?

41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions?

44:40 Extending and normalizing partial preferences and covering the rest of section 2

50:00 Moving into section 3, synthesizing the utility function in practice

52:00 Why this research agenda is helpful for other alignment methodologies

55:50 Limits of the agenda and other problems

58:40 Synthesizing a species wide utility function

1:01:20 Concerns over the alignment methodology containing leaky abstractions

1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal

1:08:10 Can we check the result of the synthesis process?

01:09:55 How did the Mahatma Armstrong idealization process fail?

01:14:40 Any clarifications for the AI alignment community?

You Can take a short (4 minute) survey to share your feedback about the podcast here: www.surveymonkey.com/r/YWHDFV7

...more

View all episodes

By Future of Life Institute

4.8

107107 ratings