September 05, 2023

EA - Strongest real-world examples supporting AI risk claims? by rosehadshar

1 minute

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Strongest real-world examples supporting AI risk claims?, published by rosehadshar on September 5, 2023 on The Effective Altruism Forum.

[Manually cross-posted to LessWrong here.]

There are some great collections of examples of things like specification gaming, goal misgeneralization, and AI improving AI. But almost all of the examples are from demos/toy environments, rather than systems which were actually deployed in the world.

There are also some databases of AI incidents which include lots of real-world examples, but the examples aren't related to failures in a way that makes it easy to map them onto AI risk claims. (Probably most of them don't in any case, but I'd guess some do.)

I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:

I think it's good practice to have a transparent overview of the current state of evidence

For many people I think real-world examples will be most convincing

I expect there to be more and more real-world examples, so starting to collect them now seems good

What are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?

I'm particularly interested in whether there are any good real-world examples of:

Goal misgeneralization

Deceptive alignment (answer: no, but yes to simple deception?)

Specification gaming

Power-seeking

Self-preservation

Self-improvement

This feeds into a project I'm working on with AI Impacts, collecting empirical evidence on various AI risk claims. There's a work-in-progress table here with the main things I'm tracking so far - additions and comments very welcome.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

September 05, 2023

EA - Strongest real-world examples supporting AI risk claims? by rosehadshar

1 minute

[Manually cross-posted to LessWrong here.]

I think collecting real-world examples (particularly in a nuanced way without claiming too much of the examples) could be pretty valuable:

I think it's good practice to have a transparent overview of the current state of evidence

For many people I think real-world examples will be most convincing

I expect there to be more and more real-world examples, so starting to collect them now seems good

What are the strongest real-world examples of AI systems doing things which might scale to AI risk claims?

I'm particularly interested in whether there are any good real-world examples of:

Goal misgeneralization

Deceptive alignment (answer: no, but yes to simple deception?)

Specification gaming

Power-seeking

Self-preservation

Self-improvement

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

Share EA - Strongest real-world examples supporting AI risk claims? by rosehadshar

Sign up to save your podcasts

EA - Strongest real-world examples supporting AI risk claims? by rosehadshar

EA - Strongest real-world examples supporting AI risk claims? by rosehadshar