February 19, 2025

“How might we safely pass the buck to AI?” by joshc

1 hour 7 minutes

My goal as an AI safety researcher is to put myself out of a job.

I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.

Instead, I worry about safely replacing human researchers with AI agents, at which point human researchers are “obsolete.” The situation is not necessarily fine after human obsolescence; however, the bulk of risks that are addressable by human technical researchers (like me) will have been addressed.

This post explains how developers might safely “pass the buck” to AI.

I first clarify what I mean by “pass the buck” (section 1) and explain why I think AI safety researchers should make safely passing the buck their primary end goal – rather than focus on the loftier ambition of aligning superintelligence (section 2).

Figure 1. A summary of why I think human AI [...]

---

Outline:

(17:27) 1. Briefly responding to objections

(20:06) 2. What I mean by passing the buck to AI

(21:53) 3. Why focus on passing the buck rather than aligning superintelligence.

(26:28) 4. Three strategies for passing the buck to AI

(29:16) 5. Conditions that imply that passing the buck improves safety

(32:01) 6. The capability condition

(35:45) 7. The trust condition

(36:36) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task

(45:49) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task

(47:06) Analogies to dictatorships suggest that autonomous control might be viable

(48:59) Listing potential autonomous control measures

(52:39) How to evaluate autonomous control

(54:31) 10. Argument #3: Returns to additional human-supervised research are small

(56:47) Control measures

(01:00:52) 11. Argument #4: AI agents are incentivized to behave as safely as humans

(01:07:01) 12. Conclusion

---

First published:

February 19th, 2025

Source:

https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

February 19, 2025

“How might we safely pass the buck to AI?” by joshc

1 hour 7 minutes

My goal as an AI safety researcher is to put myself out of a job.

I don’t worry too much about how planet sized brains will shape galaxies in 100 years. That's something for AI systems to figure out.

This post explains how developers might safely “pass the buck” to AI.

Figure 1. A summary of why I think human AI [...]

---

Outline:

(17:27) 1. Briefly responding to objections

(20:06) 2. What I mean by passing the buck to AI

(21:53) 3. Why focus on passing the buck rather than aligning superintelligence.

(26:28) 4. Three strategies for passing the buck to AI

(29:16) 5. Conditions that imply that passing the buck improves safety

(32:01) 6. The capability condition

(35:45) 7. The trust condition

(36:36) 8. Argument #1: M_1 agents are approximately aligned and will maintain their alignment until they have completed their deferred task

(45:49) 9. Argument #2: M_1 agents cannot subvert autonomous control measures while they complete the deferred task

(47:06) Analogies to dictatorships suggest that autonomous control might be viable

(48:59) Listing potential autonomous control measures

(52:39) How to evaluate autonomous control

(54:31) 10. Argument #3: Returns to additional human-supervised research are small

(56:47) Control measures

(01:00:52) 11. Argument #4: AI agents are incentivized to behave as safely as humans

(01:07:01) 12. Conclusion

---

First published:

February 19th, 2025

Source:

https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

Making Sense with Sam Harris

26,344 Listeners

Conversations with Tyler

2,444 Listeners

The Peter Attia Drive

9,132 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,153 Listeners

ManifoldOne

92 Listeners

Your Undivided Attention

1,597 Listeners

All-In with Chamath, Jason, Sacks & Friedberg

9,901 Listeners

Machine Learning Street Talk (MLST)

90 Listeners

Dwarkesh Podcast

505 Listeners

Hard Fork

5,473 Listeners

The Ezra Klein Show

16,053 Listeners

Moonshots with Peter Diamandis

540 Listeners

No Priors: Artificial Intelligence | Technology | Startups

132 Listeners

Latent Space: The AI Engineer Podcast

96 Listeners

BG2Pod with Brad Gerstner and Bill Gurley

517 Listeners

Share “How might we safely pass the buck to AI?” by joshc

Sign up to save your podcasts

“How might we safely pass the buck to AI?” by joshc

“How might we safely pass the buck to AI?” by joshc

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris

Conversations with Tyler

The Peter Attia Drive

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

ManifoldOne

Your Undivided Attention

All-In with Chamath, Jason, Sacks & Friedberg

Machine Learning Street Talk (MLST)

Dwarkesh Podcast

Hard Fork

The Ezra Klein Show

Moonshots with Peter Diamandis

No Priors: Artificial Intelligence | Technology | Startups

Latent Space: The AI Engineer Podcast

BG2Pod with Brad Gerstner and Bill Gurley