
Sign up to save your podcasts
Or


In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html
Topics we discuss, and timestamps:
0:00:29 What MONA is
0:06:33 How MONA deals with reward hacking
0:23:15 Failure cases for MONA
0:36:25 MONA's capability
0:55:40 MONA vs other approaches
1:05:03 Follow-up work
1:10:17 Other MONA test cases
1:33:47 When increasing time horizon doesn't increase capability
1:39:04 Following David's research
Links for David:
Website: https://www.davidlindner.me
Twitter / X: https://x.com/davlindner
DeepMind Medium: https://deepmindsafetyresearch.medium.com
David on the Alignment Forum: https://www.alignmentforum.org/users/david-lindner
Research we discuss:
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: https://arxiv.org/abs/2501.13011
Arguments Against Myopic Training: https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
Episode art by Hamish Doodles: hamishdoodles.com
By Daniel Filan4.4
88 ratings
In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html
Topics we discuss, and timestamps:
0:00:29 What MONA is
0:06:33 How MONA deals with reward hacking
0:23:15 Failure cases for MONA
0:36:25 MONA's capability
0:55:40 MONA vs other approaches
1:05:03 Follow-up work
1:10:17 Other MONA test cases
1:33:47 When increasing time horizon doesn't increase capability
1:39:04 Following David's research
Links for David:
Website: https://www.davidlindner.me
Twitter / X: https://x.com/davlindner
DeepMind Medium: https://deepmindsafetyresearch.medium.com
David on the Alignment Forum: https://www.alignmentforum.org/users/david-lindner
Research we discuss:
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: https://arxiv.org/abs/2501.13011
Arguments Against Myopic Training: https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
Episode art by Hamish Doodles: hamishdoodles.com

26,377 Listeners

2,430 Listeners

1,083 Listeners

107 Listeners

112,351 Listeners

211 Listeners

9,799 Listeners

89 Listeners

489 Listeners

5,468 Listeners

132 Listeners

16,152 Listeners

97 Listeners

209 Listeners

131 Listeners