Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Who Aligns the Alignment Researchers?, published by Ben Smith on March 5, 2023 on LessWrong.
There may be an incentives problem for AI researchers and research organizations who face a choice between researching Capabilities, Alignment, or neither. The incentives structure will lead individuals and organizations to work towards Capabilities work rather than Alignment. The incentives problem is a lot clearer at the organizational level than the individual level, but bears considering at both levels, and of course, funding available to organizations has downstream implications for the jobs available for researchers employed to work on Alignment or Capabilities.
In this post, I’ll describe a couple of key moments in the history of AI organizations. I’ll then survey incentives researchers might have for doing either Alignment work or Capabilities work. We’ll see that it maybe that, even considering normal levels of altruism, the average person might prefer to do Capabilities rather than Alignment work. There is relevant collective action dynamic. I’ll then survey the organizational level and global level. After that, I’ll finish by looking very briefly at why investment in Alignment might be worthwhile.
A note on the dichotomous framing of this essay: I understand that the line between Capabilities and Alignment work is blurry, or worse, some Capabilities work plausibly advances Alignment, and some Alignment work advances Capabilities, at least in the short term. However, in order to model the lay of the land, it’s helpful as a simplifying assumption to examine Capabilities and Alignment as distinct fields of research and try to understand the motivations for researchers in each.
History
As a historical matter, DeepMind and OpenAI were both founded with explicit missions to create safe, Aligned AI for the benefit of all humanity. There are different views on the extent to which each of these organizations remains aligned to that mission. Some people maintain they are, while others maintain they are doing incredible harm by shortening AI timelines. No one can deny that they have moved at least somewhat in the direction of more profit-making behavior, and are very much focused on Capabilities research. So, at best, they’ve stuck to their original mission, but having watered it down to allow a certain amount of profit-seeking; at worst, their overall efforts are net-negative for alignment by accelerating development of AGI.
OpenAI took investment from Microsoft in January, to the tune of $10b. At the time, they said
This multi-year, multi-billion dollar investment from Microsoft follows their previous investments in 2019 and 2021, and will allow us to continue our independent research and develop AI that is increasingly safe, useful, and powerful.
And this seems plausibly like a systemic pressure other AI Capabilities researchers will face, too. Because of the disparate capital available, in order to fund research in AI Safety, any AI research organization will be incentivized to do capabilities research.
On the other hand, it’s striking that no organizations founded with the goal of AI Capabilities research have drifted towards Alignment research over time. Organizations under this category might include John Carmack’s recent start-up, Keen Technologies, Alphabet, and many other organizations. Systemically, this can be explained by the rules of the capitalist environment organizations work within. If you create a company to do for-profit work, and get investors to invest in the project, they’ll expect a return. If you go public, you’ll have a fiduciary duty to obtain a return for investors. For organizations, Alignment doesn’t earn money (except in so far as it improves capabilities for tasks); Capabilities does. As the amount of money available to investors grows, more an...