Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI Launches Superalignment Taskforce, published by Zvi on July 11, 2023 on LessWrong.
In their announcement Introducing Superalignment, OpenAI committed 20% of secured compute and a new taskforce to solving the technical problem of aligning a superintelligence within four years. Cofounder and Chief Scientist Ilya Sutskever will co-lead the team with Head of Alignment Jan Leike.
This is a real and meaningful commitment of serious firepower. You love to see it. The announcement, dedication of resources and focus on the problem are all great. Especially the stated willingness to learn and modify the approach along the way.
The problem is that I remain deeply, deeply skeptical of the alignment plan. I don’t see how the plan makes the hard parts of the problem easier rather than harder.
I will begin with a close reading of the announcement and my own take on the plan on offer, then go through the reactions of others, including my take on Leike’s other statements about OpenAI’s alignment plan.
A Close Reading
Section: Introduction
Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.
While superintelligence seems far off now, we believe it could arrive this decade.
Here we focus on superintelligence rather than AGI to stress a much higher capability level. We have a lot of uncertainty over the speed of development of the technology over the next few years, so we choose to aim for the more difficult target to align a much more capable system.
Excellent. Love the ambition, admission of uncertainty and laying out that alignment of a superintelligent system is fundamentally different from and harder than aligning less intelligent AIs including current systems.
Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment: How do we ensure AI systems much smarter than humans follow human intent?
Excellent again. Superalignment is clearly defined and established as necessary for our survival. AI systems much smarter than humans must follow human intent. They also don’t (incorrectly) claim that it would be sufficient.
Bold mine here:
Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us [B], and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.
[Note B: Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training.]
Yes, yes, yes. Thank you. Current solutions will not scale. Not ‘may’ not scale. Will not scale. Nor do we know what would work. Breakthroughs are required.
Note B is also helpful, I would say ‘will’ rather than may.
A+ introduction and framing of the problem. As good as could be hoped for.
Section: Our Approach
Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.
Oh no.
An human-level automated alignment researcher is an AGI, also a human-level AI capabilities researcher.
Alignment isn’t a narrow safe domain that can be isolated. The problem deeply encompasses general skills and knowledge.
It being an AGI is not quite auto...