The Nonlinear Library

AF - Language Models are a Potentially Safe Path to Human-Level AGI by Nadav Brandes


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Language Models are a Potentially Safe Path to Human-Level AGI, published by Nadav Brandes on April 20, 2023 on The AI Alignment Forum.
The core argument: language models are more transparent and less prone to develop agency and superintelligence
I argue that compared to alternative approaches such as open-ended reinforcement learning, the recent paradigm of achieving human-level AGI with language models has the potential to be relatively safe. There are three main reasons why I believe LM-based AGI systems could be safe:
They operate on text that is intelligible to humans, which makes them relatively interpretable and easier to monitor.
They are subject to weaker pressure to surpass human-level capabilities than systems trained on more open-ended tasks, such as those pursued by reinforcement learning.
Since language models are trained as predictors, there is weaker pressure for them to develop agentic behavior.
I acknowledge that these arguments have been criticized. I will try to defend these statements, delving into the nuances and explaining how I envision relatively safe LM-based AGI systems. I want to note upfront that although I believe this paradigm is safer than other alternatives I've come across, I still think it poses significant dangers, so I’m not suggesting we should all just chill. It's also difficult to reason about these sorts of things in the abstract, and there's a good chance I may be overlooking critical considerations.
Human-level AGI might be reachable with existing language models
There are indications that GPT-4 could be considered an early form of AGI. But even though it surpasses human level over many standardized tests, as a general-purpose AGI it’s not yet at human level, at least in its vanilla form. For example, it’s still not very good at separating facts from fiction, and I wouldn’t give it full access to my email and social media accounts and ask it to throw a birthday party for me.
To address these limitations, there's a project underway that seeks to develop more capable and agentic AI systems based on language models through chained operations such as chain-of-thought reasoning. Auto-GPT is a very early example of this approach, and I expect to see much better LM-based AI systems very soon. To understand this paradigm, we can think of each GPT-4 prompt completion operation as an atomic operation that can be chained in sophisticated ways to create deeper and more intelligent thought processes. This is analogous to how humans engage in internal conversations to produce better decisions, instead of spitting out the first thought that comes to mind. Like humans, language models can deliver better results if allowed to engage in internal dialogues and access external tools and sources of data.
I can’t think of any cognitive activity that a human can perform instantaneously (through a “single cognitive step”) and GPT-4 can’t, indicating that it might be just a matter of letting GPT-4 talk with itself for long enough before it approaches human level. An important property of these internal dialogues is that they are produced in plain human language.
It remains to be seen how much progress in AI capabilities will be reached by 1) coming up with effective chaining schemes versus 2) improving the underlying language models and making each atomic operation more capable. Given the ease of chaining existing models compared to the difficulty of training state-of-the-art language models, there's a good chance that 1 would progress faster than 2. While there are very few companies with the resources to train more powerful language models, literally everyone can play with chain-of-thought prompting, and startups and commercial products in this area are already emerging. I believe there's a chance that AGI systems that are close to human level in ...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings