
Sign up to save your podcasts
Or
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
4.4
88 ratings
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
26,462 Listeners
2,389 Listeners
1,780 Listeners
298 Listeners
105 Listeners
4,136 Listeners
87 Listeners
287 Listeners
87 Listeners
389 Listeners
243 Listeners
75 Listeners
60 Listeners
145 Listeners
123 Listeners