Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Instrumental Convergence Bounty, published by Logan Zoellner on September 14, 2023 on LessWrong.
I have yet to find a real-world example that I can test my corrigibility definition on. Hence, I will send $100 to the first person who can send/show me an example of instrumental convergence that is:
Surprising, in the sense that the model was trained on a goal other than "maximize money" "maximize resources" or "take over the world"
Natural, in the sense that instrumental convergence arose while trying to do some objective, not with the goal in advance being to show instrumental convergence
Reproducible, in the sense that I could plausibly run the model+environment+whatever else is needed to show instrumental convergence on a box I can rent on lambda
Example of what would count as a valid solution:
I was training an agent to pick apples in Terraria and it took over the entire world in order to convert it into a massive apple orchard
Example that would fail because it is not surprising
I trained an agent to play CIV IV and it took over the world
Example that would fail because it is not natural
After reading your post, I created a toy model where an agent told to pick apples tiles the world with apple trees
Example that would fail because it is not reproducible
The US military did a simulation in which they trained a drone which decided to take out is operator
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org