
Sign up to save your podcasts
Or
There are three main ways to try to understand and reason about powerful future AGI agents:
I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:
---
Outline:
(01:29) The Analogy
(01:52) What happens when training incentives conflict with goals/principles
(08:14) Appendix: Three important concepts/distinctions
(08:38) Goals vs. Principles
(09:39) Contextually activated goals/principles
(12:32) Stability and/or consistency of goals/principles
---
First published:
Source:
Narrated by TYPE III AUDIO.
There are three main ways to try to understand and reason about powerful future AGI agents:
I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:
---
Outline:
(01:29) The Analogy
(01:52) What happens when training incentives conflict with goals/principles
(08:14) Appendix: Three important concepts/distinctions
(08:38) Goals vs. Principles
(09:39) Contextually activated goals/principles
(12:32) Stability and/or consistency of goals/principles
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,326 Listeners
2,398 Listeners
7,868 Listeners
4,107 Listeners
87 Listeners
1,451 Listeners
8,758 Listeners
90 Listeners
352 Listeners
5,358 Listeners
15,037 Listeners
465 Listeners
129 Listeners
72 Listeners
433 Listeners