The Nonlinear Library

EA - Strongest real-world examples supporting AI risk claims? by rosehadshar


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Strongest real-world examples supporting AI risk claims?, published by rosehadshar on September 5, 2023 on The Effective Altruism Forum.
[Manually cross-posted to LessWrong here.]
There are some great collections of examples of things like specification gaming, goal misgeneralization, and AI improving AI. But almost all of the examples are from demos/toy environments, rather than systems which were actually deployed in the world.
There are also some databases of AI incidents which include lots of real-world examples, but the examples aren't related to failures in a way that makes it easy to map them onto AI risk claims. (Probably most of them don't in any case, but I'd guess some do.)
I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:
I think it's good practice to have a transparent overview of the current state of evidence
For many people I think real-world examples will be most convincing
I expect there to be more and more real-world examples, so starting to collect them now seems good
What are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?
I'm particularly interested in whether there are any good real-world examples of:
Goal misgeneralization
Deceptive alignment (answer: no, but yes to simple deception?)
Specification gaming
Power-seeking
Self-preservation
Self-improvement
This feeds into a project I'm working on with AI Impacts, collecting empirical evidence on various AI risk claims. There's a work-in-progress table here with the main things I'm tracking so far - additions and comments very welcome.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings