
Sign up to save your podcasts
Or
By Roland Pihlakas, Sruthi Kuriakose, Shruti Datta Gupta
Summary and Key Takeaways
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser". Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL utility-monster problems are still relevant with LLMs as well.
Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble utility monsters in the following distinct ways:
Our findings suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. While current LLMs do conceptually grasp [...]
---
Outline:
(00:18) Summary and Key Takeaways
(03:20) Motivation: The Importance of Biological and Economic Alignment
(04:23) Benchmark Principles Overview
(05:41) Experimental Results and Interesting Failure Modes
(08:03) Hypothesised Explanations for Failure Modes
(11:37) Open Questions
(14:03) Future Directions
(15:30) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
By Roland Pihlakas, Sruthi Kuriakose, Shruti Datta Gupta
Summary and Key Takeaways
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser". Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL utility-monster problems are still relevant with LLMs as well.
Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble utility monsters in the following distinct ways:
Our findings suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. While current LLMs do conceptually grasp [...]
---
Outline:
(00:18) Summary and Key Takeaways
(03:20) Motivation: The Importance of Biological and Economic Alignment
(04:23) Benchmark Principles Overview
(05:41) Experimental Results and Interesting Failure Modes
(08:03) Hypothesised Explanations for Failure Modes
(11:37) Open Questions
(14:03) Future Directions
(15:30) Links
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,358 Listeners
2,397 Listeners
7,818 Listeners
4,111 Listeners
87 Listeners
1,455 Listeners
8,768 Listeners
90 Listeners
354 Listeners
5,356 Listeners
15,019 Listeners
463 Listeners
128 Listeners
65 Listeners
432 Listeners