
Sign up to save your podcasts
Or


Introduction
I have long felt confused about the question of whether brain-like AGI would be likely to scheme, given behaviorist rewards. …Pause to explain jargon:
---
Outline:
(00:06) Introduction
(03:17) Note on the experimental self-dialogue format
(05:32) ...Let the self-dialogue begin!
(05:36) 1. Debating the scope of the debate: Brain-like AGI with a behaviorist reward function
(15:06) 2. Do behaviorist primary rewards lead to behavior-based motivations?
(20:55) 3. Why don't humans wirehead all the time?
(36:31) 4. Side-track on the meaning of interpretability-based primary rewards
(40:04) 5. Wrapping up the Cookie Story
(42:06) 6. Side-track: does perfect exploration really lead to an explicit desire to wirehead?
(50:07) 7. Imperfect labels
(55:17) 8. Adding more specifics to the scenario
(58:10) 9. Do imperfect labels lead to explicitly caring, vs implicitly caring, vs not caring about human feedback per se?
(01:08:52) 10. The training game
(01:15:06) 11. Three more random arguments for optimism
(01:25:01) 12. Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongIntroduction
I have long felt confused about the question of whether brain-like AGI would be likely to scheme, given behaviorist rewards. …Pause to explain jargon:
---
Outline:
(00:06) Introduction
(03:17) Note on the experimental self-dialogue format
(05:32) ...Let the self-dialogue begin!
(05:36) 1. Debating the scope of the debate: Brain-like AGI with a behaviorist reward function
(15:06) 2. Do behaviorist primary rewards lead to behavior-based motivations?
(20:55) 3. Why don't humans wirehead all the time?
(36:31) 4. Side-track on the meaning of interpretability-based primary rewards
(40:04) 5. Wrapping up the Cookie Story
(42:06) 6. Side-track: does perfect exploration really lead to an explicit desire to wirehead?
(50:07) 7. Imperfect labels
(55:17) 8. Adding more specifics to the scenario
(58:10) 9. Do imperfect labels lead to explicitly caring, vs implicitly caring, vs not caring about human feedback per se?
(01:08:52) 10. The training game
(01:15:06) 11. Three more random arguments for optimism
(01:25:01) 12. Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,329 Listeners

2,442 Listeners

9,152 Listeners

4,152 Listeners

92 Listeners

1,598 Listeners

9,901 Listeners

90 Listeners

505 Listeners

5,472 Listeners

16,038 Listeners

539 Listeners

133 Listeners

95 Listeners

515 Listeners