
Sign up to save your podcasts
Or


In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html
Topics we discuss, and timestamps:
0:00:29 What MONA is
0:06:33 How MONA deals with reward hacking
0:23:15 Failure cases for MONA
0:36:25 MONA's capability
0:55:40 MONA vs other approaches
1:05:03 Follow-up work
1:10:17 Other MONA test cases
1:33:47 When increasing time horizon doesn't increase capability
1:39:04 Following David's research
Links for David:
Website: https://www.davidlindner.me
Twitter / X: https://x.com/davlindner
DeepMind Medium: https://deepmindsafetyresearch.medium.com
David on the Alignment Forum: https://www.alignmentforum.org/users/david-lindner
Research we discuss:
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: https://arxiv.org/abs/2501.13011
Arguments Against Myopic Training: https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
Episode art by Hamish Doodles: hamishdoodles.com
By Daniel Filan4.4
88 ratings
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html
Topics we discuss, and timestamps:
0:00:29 What MONA is
0:06:33 How MONA deals with reward hacking
0:23:15 Failure cases for MONA
0:36:25 MONA's capability
0:55:40 MONA vs other approaches
1:05:03 Follow-up work
1:10:17 Other MONA test cases
1:33:47 When increasing time horizon doesn't increase capability
1:39:04 Following David's research
Links for David:
Website: https://www.davidlindner.me
Twitter / X: https://x.com/davlindner
DeepMind Medium: https://deepmindsafetyresearch.medium.com
David on the Alignment Forum: https://www.alignmentforum.org/users/david-lindner
Research we discuss:
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: https://arxiv.org/abs/2501.13011
Arguments Against Myopic Training: https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
Episode art by Hamish Doodles: hamishdoodles.com

26,340 Listeners

2,442 Listeners

1,096 Listeners

107 Listeners

112,934 Listeners

210 Listeners

9,945 Listeners

94 Listeners

500 Listeners

5,490 Listeners

140 Listeners

16,096 Listeners

94 Listeners

209 Listeners

133 Listeners