
Sign up to save your podcasts
Or


In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
By Daniel Filan4.4
88 ratings
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com

26,340 Listeners

2,442 Listeners

1,096 Listeners

107 Listeners

112,934 Listeners

210 Listeners

9,945 Listeners

94 Listeners

500 Listeners

5,490 Listeners

140 Listeners

16,096 Listeners

94 Listeners

209 Listeners

133 Listeners