The Nonlinear Library

AF - Thoughts on sharing information about language model capabilities by Paul Christiano


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on sharing information about language model capabilities, published by Paul Christiano on July 31, 2023 on The AI Alignment Forum.
Core claim
I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI - despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular).
Concretely, I mean to include information like: tasks and evaluation frameworks for LM agents, the results of evaluations of particular agents, discussions of the qualitative strengths and weaknesses of agents, and information about agent design that may represent small improvements over the state of the art (insofar as that information is hard to decouple from evaluation results).
Context
ARC Evals currently focuses on evaluating the capabilities and limitations of existing ML systems, with an aim towards understanding whether or when they may be capable enough to pose catastrophic risks. Current evaluations are particularly focused on monitoring progress in language model agents.
I believe that sharing this kind of information significantly improves society's ability to handle risks from AI, and so I am encouraging the team to share more information. However this issue is certainly not straightforward, and in some places (particularly in the EA community where this post is being shared) I believe my position is controversial.
I'm writing this post at the request of the Evals team to lay out my views publicly. I am speaking only for myself. I believe the team is broadly sympathetic to my position, but would prefer to see a broader and more thorough discussion about this question.
I do not think this post presents a complete or convincing argument for my beliefs. The purpose is mostly to outline and explain the basic view, at a similar level of clarity and thoroughness to the arguments against sharing information (which have mostly not been laid out explicitly).
Accelerating LM agents seems neutral (or maybe positive)
I believe that accelerating LM agents increases safety through two channels:
LM agents are an unusually safe way to build powerful AI systems. Existing concerns about AI takeover are driven primarily by scaling up black-box optimization, and increasing our reliance on human-comprehensible decompositions with legible interfaces seems like it significantly improves safety. I think this is a large effect size and is suggested by several independent lines of reasoning.
To the extent that existing LM agents are weak it is due to exceptionally low investment. This is creating "dry tinder:" as incentives rise that investment will quickly rise and so low-hanging fruit will be picked. While there is some dependence on serial time, I think that increased LM investment now will significantly slow down progress later.
I will discuss these mechanisms in more detail in the rest of this section.
I also think that accelerating LM agents will drive investment in improving and deploying ML systems, and so can reduce time available to react to risk. As a result I'm ambivalent about the net effect of improving the design of LM agents - my personal tentative guess is that it's positive, but I would be hesitant about deliberately accelerating LM agents to improve safety (moreover I think this would be a very unleveraged approach to improving safety and would strongly discourage anyone from pursuing it).
But this means that I am significantly less concerned about information about LM agent capabilities accelerating progress on LM agents. Given that I am already positively disposed towards sharing information about ML capabilities and limitations despite the risk of acceleration, I am particularly positive about sharing...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings