Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds", published by mattmacdermott on February 29, 2024 on LessWrong.
Yoshua Bengio recently posted a high-level overview of his alignment research agenda on his blog. I'm pasting the full text below since it's fairly short.
What can't we afford with a future superintelligent AI? Among others, confidently wrong predictions about the harm that some actions could yield. Especially catastrophic harm. Especially if these actions could spell the end of humanity.
How can we design an AI that will be highly capable and will not harm humans? In my opinion, we need to figure out this question - of controlling AI so that it behaves in really safe ways - before we reach human-level AI, aka AGI; and to be successful, we need all hands on deck. Economic and military pressures to accelerate advances in AI capabilities will continue to push forward even if we have not figured out how to make superintelligent AI safe.
And even if some regulations and treaties are put into place to reduce the risks, it is plausible that human greed for power and wealth and the forces propelling competition between humans, corporations and countries, will continue to speed up dangerous technological advances.
Right now, science has no clear answer to this question of AI control and how to align its intentions and behavior with democratically chosen values. It is a bit like in the "Don't Look Up" movie. Some scientists have arguments about the plausibility of scenarios (e.g., see "Human Compatible") where a planet-killing asteroid is headed straight towards us and may come close to the atmosphere.
In the case of AI there is more uncertainty, first about the probability of different scenarios (including about future public policies) and about the timeline, which could be years or decades according to leading AI researchers. And there are no convincing scientific arguments which contradict these scenarios and reassure us for certain, nor is there any known method to "deflect the asteroid", i.e., avoid catastrophic outcomes from future powerful AI systems.
With the survival of humanity at stake, we should invest massively in this scientific problem, to understand this asteroid and discover ways to deflect it. Given the stakes, our responsibility to humanity, our children and grandchildren, and the enormity of the scientific problem, I believe this to be the most pressing challenge in computer science that will dictate our collective wellbeing as a species.
Solving it could of course help us greatly with many other challenges, including disease, poverty and climate change, because AI clearly has beneficial uses. In addition to this scientific problem, there is also a political problem that needs attention: how do we make sure that no one triggers a catastrophe or takes over political power when AGI becomes widely available or even as we approach it. See this article of mine in the Journal of Democracy on this topic.
In this blog post, I will focus on an approach to the scientific challenge of AI control and alignment. Given the stakes, I find it particularly important to focus on approaches which give us the strongest possible AI safety guarantees. Over the last year, I have been thinking about this and I started writing about it in this May 2023 blog post (also see my December 2023 Alignment Workshop keynote presentation).
Here, I will spell out some key thoughts that came out of a maturation of my reflection on this topic and that are driving my current main research focus.
I have received funding to explore this research program and I am looking for researchers motivated by existential risk and with expertise in the span of mathematics (especially about probabilistic methods), machine learning (especially about amorti...