
Sign up to save your podcasts
Or


Subtitle: Developing AI employees that are safer than human ones.
My goal as an AI safety researcher is to put myself out of a job.
I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.
Instead, I worry about safely replacing human researchers with AI agents, at which point human researchers are “obsolete.” The situation is not necessarily fine after human obsolescence; however, the bulk of risks that are addressable by human technical researchers (like me) will have been addressed.
This post explains how developers might safely “pass the buck” to AI.
I first clarify what I mean by “pass the buck” (section 1) and explain why I think AI safety researchers should make safely passing the buck their primary end goal – rather than focus on the loftier ambition of aligning superintelligence (section [...]
---
Outline:
(19:08) 1. Briefly responding to objections
(21:54) 2. What I mean by passing the buck to AI
(23:46) 3. Why focus on passing the buck rather than aligning superintelligence.
(28:33) 4. Three strategies for passing the buck to AI
(31:35) 5. Conditions that imply that passing the buck improves safety
(34:32) 6. The capability condition
(38:31) 7. The trust condition
(39:27) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task
(49:28) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task
(50:51) Analogies to dictatorships suggest that autonomous control might be viable
(52:51) Listing potential autonomous control measures
(56:19) How to evaluate autonomous control
(58:17) 10. Argument #3: Returns to additional human-supervised research are small
(01:00:38) Control measures
(01:05:03) 11. Argument #4: AI agents are incentivized to behave as safely as humans
(01:11:25) 12. Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By Redwood ResearchSubtitle: Developing AI employees that are safer than human ones.
My goal as an AI safety researcher is to put myself out of a job.
I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.
Instead, I worry about safely replacing human researchers with AI agents, at which point human researchers are “obsolete.” The situation is not necessarily fine after human obsolescence; however, the bulk of risks that are addressable by human technical researchers (like me) will have been addressed.
This post explains how developers might safely “pass the buck” to AI.
I first clarify what I mean by “pass the buck” (section 1) and explain why I think AI safety researchers should make safely passing the buck their primary end goal – rather than focus on the loftier ambition of aligning superintelligence (section [...]
---
Outline:
(19:08) 1. Briefly responding to objections
(21:54) 2. What I mean by passing the buck to AI
(23:46) 3. Why focus on passing the buck rather than aligning superintelligence.
(28:33) 4. Three strategies for passing the buck to AI
(31:35) 5. Conditions that imply that passing the buck improves safety
(34:32) 6. The capability condition
(38:31) 7. The trust condition
(39:27) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task
(49:28) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task
(50:51) Analogies to dictatorships suggest that autonomous control might be viable
(52:51) Listing potential autonomous control measures
(56:19) How to evaluate autonomous control
(58:17) 10. Argument #3: Returns to additional human-supervised research are small
(01:00:38) Control measures
(01:05:03) 11. Argument #4: AI agents are incentivized to behave as safely as humans
(01:11:25) 12. Conclusion
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.