
Sign up to save your podcasts
Or
By Roland Pihlakas, Sruthi Kuriakose, Shruti Datta Gupta
Summary and Key Takeaways
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser". Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL utility-monster problems are still relevant with LLMs as well.
Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble utility monsters in the following distinct ways:
Our findings suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. While current LLMs do conceptually grasp [...]
---
Outline:
(00:18) Summary and Key Takeaways
(03:20) Motivation: The Importance of Biological and Economic Alignment
(04:23) Benchmark Principles Overview
(05:41) Experimental Results and Interesting Failure Modes
(08:03) Hypothesised Explanations for Failure Modes
(11:37) Open Questions
(14:03) Future Directions
(15:30) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
By Roland Pihlakas, Sruthi Kuriakose, Shruti Datta Gupta
Summary and Key Takeaways
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser". Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL utility-monster problems are still relevant with LLMs as well.
Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble utility monsters in the following distinct ways:
Our findings suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. While current LLMs do conceptually grasp [...]
---
Outline:
(00:18) Summary and Key Takeaways
(03:20) Motivation: The Importance of Biological and Economic Alignment
(04:23) Benchmark Principles Overview
(05:41) Experimental Results and Interesting Failure Modes
(08:03) Hypothesised Explanations for Failure Modes
(11:37) Open Questions
(14:03) Future Directions
(15:30) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,334 Listeners
2,389 Listeners
8,004 Listeners
4,120 Listeners
90 Listeners
1,494 Listeners
9,254 Listeners
91 Listeners
424 Listeners
5,448 Listeners
15,457 Listeners
506 Listeners
127 Listeners
71 Listeners
466 Listeners