Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A transcript of the TED talk by Eliezer Yudkowsky, published by Mikhail Samin on July 12, 2023 on LessWrong.
The TED talk is available on YouTube and the TED website. Previously, a live recording was published behind the paywall on the conference website and later (likely accidentally) on a random TEDx YouTube channel but was later removed.
The transcription is done with Whisper.
You've heard that things are moving fast in artificial intelligence. How fast? So fast that I was suddenly told on Friday that I needed to be here.So, no slides, six minutes.
Since 2001, I've been working on what we would now call the problem of aligning artificial general intelligence: how to shape the preferences and behavior of a powerful artificial mind such that it does not kill everyone.
I more or less founded the field two decades ago when nobody else considered it rewarding enough to work on. I tried to get this very important project started early so we'd be in less of a drastic rush later.
I consider myself to have failed.
Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating-point numbers that we nudge in the direction of better performance until they inexplicably start working.
At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity.
Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.
What happens if we build something smarter than us that we understand that poorly?
Some people find it obvious that building something smarter than us that we don't understand might go badly. Others come in with a very wide range of hopeful thoughts about how it might possibly go well. Even if I had 20 minutes for this talk and months to prepare it, I would not be able to refute all the ways people find to imagine that things might go well.
But I will say that there is no standard scientific consensus for how things will go well. There is no hope that has been widely persuasive and stood up to skeptical examination. There is nothing resembling a real engineering plan for us surviving that I could critique.This is not a good place in which to find ourselves.
If I had more time, I'd try to tell you about the predictable reasons why the current paradigm will not work to build a superintelligence that likes you or is friends with you, or that just follows orders.
Why, if you press thumbs up when humans think that things went right or thumbs down when another AI system thinks that they went wrong, you do not get a mind that wants nice things in a way that generalizes well outside the training distribution to where the AI is smarter than the trainers.
You can search for Yudkowsky, List of Lethalities for more.
But to worry, you do not need to believe me about exact predictions of exact disasters. You just need to expect that things are not going to work great on the first really serious, really critical try because an AI system smart enough to be truly dangerous was meaningfully different from AI systems stupider than that.
My prediction is that this ends up with us facing down something smarter than us that does not want what we want, that does not want anything we recognize as valuable or meaningful.
I cannot predict exactly how a conflict between humanity and a smarter AI would go for the same reason I can't predict exactly how you would lose a chess game to one of the current top AI chess programs, let's say, Stockfish.
If I could predict exactly where Stockfish could move, I could play chess that well myself. I can't predict exactly how you'll lose to Stockfish, but I can predict who wins the game.
I do not expect something actually smart to attack us with marching robot armies with glowing red...