
Sign up to save your podcasts
Or
In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition?
For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred [...]
---
Outline:
(01:59) What are agent behavior and agent structure?
(05:09) Motivation
(06:47) A loose formalism
(07:42) The setting
(11:29) The environment class and policy class
(14:30) A measure of performance
(15:24) Filter the policy class
(16:57) Take the limit
(20:10) Alternatives and options for a tighter formalism
(20:36) Setting
(22:11) Performance measure
(23:52) Filtering
(24:21) Other theorem forms
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition?
For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred [...]
---
Outline:
(01:59) What are agent behavior and agent structure?
(05:09) Motivation
(06:47) A loose formalism
(07:42) The setting
(11:29) The environment class and policy class
(14:30) A measure of performance
(15:24) Filter the policy class
(16:57) Take the limit
(20:10) Alternatives and options for a tighter formalism
(20:36) Setting
(22:11) Performance measure
(23:52) Filtering
(24:21) Other theorem forms
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners