
Sign up to save your podcasts
Or


There are three main ways to try to understand and reason about powerful future AGI agents:
I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:
---
Outline:
(01:29) The Analogy
(01:52) What happens when training incentives conflict with goals/principles
(08:14) Appendix: Three important concepts/distinctions
(08:38) Goals vs. Principles
(09:39) Contextually activated goals/principles
(12:32) Stability and/or consistency of goals/principles
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThere are three main ways to try to understand and reason about powerful future AGI agents:
I think it's valuable to try all three approaches. Today I'm exploring strategy #3, building an extended analogy between:
---
Outline:
(01:29) The Analogy
(01:52) What happens when training incentives conflict with goals/principles
(08:14) Appendix: Three important concepts/distinctions
(08:38) Goals vs. Principles
(09:39) Contextually activated goals/principles
(12:32) Stability and/or consistency of goals/principles
---
First published:
Source:
Narrated by TYPE III AUDIO.

26,338 Listeners

2,441 Listeners

9,137 Listeners

4,150 Listeners

92 Listeners

1,597 Listeners

9,897 Listeners

90 Listeners

505 Listeners

5,475 Listeners

16,042 Listeners

540 Listeners

133 Listeners

95 Listeners

516 Listeners