
Sign up to save your podcasts
Or


In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
By Daniel Filan4.4
88 ratings
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com

26,376 Listeners

2,429 Listeners

1,087 Listeners

107 Listeners

112,376 Listeners

211 Listeners

9,799 Listeners

89 Listeners

488 Listeners

5,472 Listeners

132 Listeners

16,144 Listeners

96 Listeners

209 Listeners

131 Listeners