Future of Life Institute Podcast

AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong


Listen Later

In his Research Agenda v0.9: Synthesizing a human's preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It's up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an "adequate" utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment.
Topics discussed in this episode include:
-The core aspects and ideas of Stuart's research agenda
-Human values being changeable, manipulable, contradictory, and underdefined
-This research agenda in the context of the broader AI alignment landscape
-What the proposed synthesis process looks like
-How to identify human partial preferences
-Why a utility function anyway?
-Idealization and reflective equilibrium
-Open questions and potential problem areas
Here you can find the podcast page: https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/
Important timestamps: 
0:00 Introductions 
3:24 A story of evolution (inspiring just-so story)
6:30 How does your “inspiring just-so story” help to inform this research agenda?
8:53 The two core parts to the research agenda 
10:00 How this research agenda is contextualized in the AI alignment landscape
12:45 The fundamental ideas behind the research project 
15:10 What are partial preferences? 
17:50 Why reflexive self-consistency isn’t enough 
20:05 How are humans contradictory and how does this affect the difficulty of the agenda?
25:30 Why human values being underdefined presents the greatest challenge 
33:55 Expanding on the synthesis process 
35:20 How to extract the partial preferences of the person 
36:50 Why a utility function? 
41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions?
44:40 Extending and normalizing partial preferences and covering the rest of section 2 
50:00 Moving into section 3, synthesizing the utility function in practice 
52:00 Why this research agenda is helpful for other alignment methodologies 
55:50 Limits of the agenda and other problems 
58:40 Synthesizing a species wide utility function 
1:01:20 Concerns over the alignment methodology containing leaky abstractions 
1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal 
1:08:10 Can we check the result of the synthesis process?
01:09:55 How did the Mahatma Armstrong idealization process fail? 
01:14:40 Any clarifications for the AI alignment community? 
You Can take a short (4 minute) survey to share your feedback about the podcast here: www.surveymonkey.com/r/YWHDFV7
...more
View all episodesView all episodes
Download on the App Store

Future of Life Institute PodcastBy Future of Life Institute

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

103 ratings


More shows like Future of Life Institute Podcast

View all
EconTalk by Russ Roberts

EconTalk

4,227 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,401 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,110 Listeners

Lex Fridman Podcast by Lex Fridman

Lex Fridman Podcast

12,558 Listeners

Your Undivided Attention by Tristan Harris and Aza Raskin, The Center for Humane Technology

Your Undivided Attention

1,443 Listeners

The Origins Podcast with Lawrence Krauss by Lawrence M. Krauss

The Origins Podcast with Lawrence Krauss

491 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

203 Listeners

COMPLEXITY by Santa Fe Institute

COMPLEXITY

280 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

350 Listeners

Clearer Thinking with Spencer Greenberg by Spencer Greenberg

Clearer Thinking with Spencer Greenberg

132 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

64 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

62 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

421 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

116 Listeners