
Sign up to save your podcasts
Or


In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition?
For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred [...]
---
Outline:
(01:59) What are agent behavior and agent structure?
(05:09) Motivation
(06:47) A loose formalism
(07:42) The setting
(11:29) The environment class and policy class
(14:30) A measure of performance
(15:24) Filter the policy class
(16:57) Take the limit
(20:10) Alternatives and options for a tighter formalism
(20:36) Setting
(22:11) Performance measure
(23:52) Filtering
(24:21) Other theorem forms
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongIn Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition?
For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred [...]
---
Outline:
(01:59) What are agent behavior and agent structure?
(05:09) Motivation
(06:47) A loose formalism
(07:42) The setting
(11:29) The environment class and policy class
(14:30) A measure of performance
(15:24) Filter the policy class
(16:57) Take the limit
(20:10) Alternatives and options for a tighter formalism
(20:36) Setting
(22:11) Performance measure
(23:52) Filtering
(24:21) Other theorem forms
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners