
Sign up to save your podcasts
Or


Summary
Generalization is one lens on the alignment challenge. We'd like network-based AGI to generalize ethical judgments as well as some humans do. Broadening training is a classic and obvious approach to improving generalization in neural networks.
Training sets might be broadened to include decisions like whether to evade human control, how to run the world if the opportunity arises, and how to think about one's self and one's goals. Such training might be useful if it's consistent with capability training. But it could backfire if it amounts to lying to a highly intelligent general reasoning system.
Broader training sets on types of decisionsTraining sets for alignment could be broadened in two main ways: types of decisions, and the contexts in which those decisions occur.
Any training method could benefit from better training sets, including current alignment training like constitutional AI. The effects of broadening alignment training sets can be investigated empirically, but little work to date directly addresses alignment. Broadening the training set won't solve alignment on its own. It doesn't directly address mesa-optimization concerns. But it should[1] help as part of a hodge-podge collection of alignment approaches.
This is a brief take on [...]
---
Outline:
(00:10) Summary
(01:40) Alignment generalization is more nuanced than IID vs OOD
(04:02) Generalization for visual and ethical judgments
(07:57) Examples of broadening the training set for alignment
(10:24) Broadened training for more human-like representations
(12:22) Broadening the training set to include reasoning about goals
(15:18) Provisional conclusions and next directions
The original text contained 8 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongSummary
Generalization is one lens on the alignment challenge. We'd like network-based AGI to generalize ethical judgments as well as some humans do. Broadening training is a classic and obvious approach to improving generalization in neural networks.
Training sets might be broadened to include decisions like whether to evade human control, how to run the world if the opportunity arises, and how to think about one's self and one's goals. Such training might be useful if it's consistent with capability training. But it could backfire if it amounts to lying to a highly intelligent general reasoning system.
Broader training sets on types of decisionsTraining sets for alignment could be broadened in two main ways: types of decisions, and the contexts in which those decisions occur.
Any training method could benefit from better training sets, including current alignment training like constitutional AI. The effects of broadening alignment training sets can be investigated empirically, but little work to date directly addresses alignment. Broadening the training set won't solve alignment on its own. It doesn't directly address mesa-optimization concerns. But it should[1] help as part of a hodge-podge collection of alignment approaches.
This is a brief take on [...]
---
Outline:
(00:10) Summary
(01:40) Alignment generalization is more nuanced than IID vs OOD
(04:02) Generalization for visual and ethical judgments
(07:57) Examples of broadening the training set for alignment
(10:24) Broadened training for more human-like representations
(12:22) Broadening the training set to include reasoning about goals
(15:18) Provisional conclusions and next directions
The original text contained 8 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

113,081 Listeners

132 Listeners

7,271 Listeners

530 Listeners

16,299 Listeners

4 Listeners

14 Listeners

2 Listeners