
Sign up to save your podcasts
Or
Introduction
I have long felt confused about the question of whether brain-like AGI would be likely to scheme, given behaviorist rewards. …Pause to explain jargon:
---
Outline:
(00:06) Introduction
(03:17) Note on the experimental self-dialogue format
(05:32) ...Let the self-dialogue begin!
(05:36) 1. Debating the scope of the debate: Brain-like AGI with a behaviorist reward function
(15:06) 2. Do behaviorist primary rewards lead to behavior-based motivations?
(20:55) 3. Why don't humans wirehead all the time?
(36:31) 4. Side-track on the meaning of interpretability-based primary rewards
(40:04) 5. Wrapping up the Cookie Story
(42:06) 6. Side-track: does perfect exploration really lead to an explicit desire to wirehead?
(50:07) 7. Imperfect labels
(55:17) 8. Adding more specifics to the scenario
(58:10) 9. Do imperfect labels lead to explicitly caring, vs implicitly caring, vs not caring about human feedback per se?
(01:08:52) 10. The training game
(01:15:06) 11. Three more random arguments for optimism
(01:25:01) 12. Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Introduction
I have long felt confused about the question of whether brain-like AGI would be likely to scheme, given behaviorist rewards. …Pause to explain jargon:
---
Outline:
(00:06) Introduction
(03:17) Note on the experimental self-dialogue format
(05:32) ...Let the self-dialogue begin!
(05:36) 1. Debating the scope of the debate: Brain-like AGI with a behaviorist reward function
(15:06) 2. Do behaviorist primary rewards lead to behavior-based motivations?
(20:55) 3. Why don't humans wirehead all the time?
(36:31) 4. Side-track on the meaning of interpretability-based primary rewards
(40:04) 5. Wrapping up the Cookie Story
(42:06) 6. Side-track: does perfect exploration really lead to an explicit desire to wirehead?
(50:07) 7. Imperfect labels
(55:17) 8. Adding more specifics to the scenario
(58:10) 9. Do imperfect labels lead to explicitly caring, vs implicitly caring, vs not caring about human feedback per se?
(01:08:52) 10. The training game
(01:15:06) 11. Three more random arguments for optimism
(01:25:01) 12. Conclusion
The original text contained 4 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,334 Listeners
2,399 Listeners
7,859 Listeners
4,107 Listeners
87 Listeners
1,453 Listeners
8,761 Listeners
90 Listeners
353 Listeners
5,356 Listeners
15,023 Listeners
464 Listeners
128 Listeners
73 Listeners
433 Listeners