September 14, 2023

LW - Instrumental Convergence Bounty by Logan Zoellner

1 minute

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Instrumental Convergence Bounty, published by Logan Zoellner on September 14, 2023 on LessWrong.

I have yet to find a real-world example that I can test my corrigibility definition on. Hence, I will send $100 to the first person who can send/show me an example of instrumental convergence that is:

Surprising, in the sense that the model was trained on a goal other than "maximize money" "maximize resources" or "take over the world"

Natural, in the sense that instrumental convergence arose while trying to do some objective, not with the goal in advance being to show instrumental convergence

Reproducible, in the sense that I could plausibly run the model+environment+whatever else is needed to show instrumental convergence on a box I can rent on lambda

Example of what would count as a valid solution:

I was training an agent to pick apples in Terraria and it took over the entire world in order to convert it into a massive apple orchard

Example that would fail because it is not surprising

I trained an agent to play CIV IV and it took over the world

Example that would fail because it is not natural

After reading your post, I created a toy model where an agent told to pick apples tiles the world with apple trees

Example that would fail because it is not reproducible

The US military did a simulation in which they trained a drone which decided to take out is operator

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

September 14, 2023

LW - Instrumental Convergence Bounty by Logan Zoellner

1 minute

Surprising, in the sense that the model was trained on a goal other than "maximize money" "maximize resources" or "take over the world"

Natural, in the sense that instrumental convergence arose while trying to do some objective, not with the goal in advance being to show instrumental convergence

Reproducible, in the sense that I could plausibly run the model+environment+whatever else is needed to show instrumental convergence on a box I can rent on lambda

Example of what would count as a valid solution:

I was training an agent to pick apples in Terraria and it took over the entire world in order to convert it into a massive apple orchard

Example that would fail because it is not surprising

I trained an agent to play CIV IV and it took over the world

Example that would fail because it is not natural

After reading your post, I created a toy model where an agent told to pick apples tiles the world with apple trees

Example that would fail because it is not reproducible

The US military did a simulation in which they trained a drone which decided to take out is operator

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

Share LW - Instrumental Convergence Bounty by Logan Zoellner

Sign up to save your podcasts

LW - Instrumental Convergence Bounty by Logan Zoellner

LW - Instrumental Convergence Bounty by Logan Zoellner