
Sign up to save your podcasts
Or


Summary: If we validate automated alignment research through empirical testing, the safety assurance work will still need to be done by humans, and will be similar to that needed for human-written alignment algorithms.
Three levels of automated AI safety
Automating AI safety means developing some algorithm which takes in data and outputs safe, highly-capable AI systems. Let's imagine three ways of developing this algorithm:
---
Outline:
(00:19) Three levels of automated AI safety
(03:09) A model of empirically checked automated safety
(07:18) We should understand if the collapsed alignment scheme is safe
(09:45) We should ask if less automation is possible
(12:11) Checking the algorithms avoids collapse, but is hard
(14:11) Acknowledgements
The original text contained 1 image which was described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongSummary: If we validate automated alignment research through empirical testing, the safety assurance work will still need to be done by humans, and will be similar to that needed for human-written alignment algorithms.
Three levels of automated AI safety
Automating AI safety means developing some algorithm which takes in data and outputs safe, highly-capable AI systems. Let's imagine three ways of developing this algorithm:
---
Outline:
(00:19) Three levels of automated AI safety
(03:09) A model of empirically checked automated safety
(07:18) We should understand if the collapsed alignment scheme is safe
(09:45) We should ask if less automation is possible
(12:11) Checking the algorithms avoids collapse, but is hard
(14:11) Acknowledgements
The original text contained 1 image which was described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

113,159 Listeners

131 Listeners

7,263 Listeners

530 Listeners

16,379 Listeners

4 Listeners

14 Listeners

2 Listeners