April 12, 2023

AF - AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson by DanielFilan

1 hour 35 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson, published by DanielFilan on April 12, 2023 on The AI Alignment Forum.

Google Podcasts link

How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.

Topics we discuss:

‘Reform’ AI alignment

Epistemology of AI risk

Immediate problems and existential risk

Aligning deceitful AI

Stories of AI doom

Language models

Democratic governance of AI

What would change Scott’s mind

Watermarking language model outputs

Watermark key secrecy and backdoor insertion

Scott’s transition to AI research

Theoretical computer science and AI alignment

AI alignment and formalizing philosophy

How Scott finds AI research

Following Scott’s research

Daniel Filan: Hello, everyone. In this episode, I’ll be speaking with Scott Aaronson. Scott is a professor of computer science at UT Austin and he’s currently spending a year as a visiting scientist at OpenAI working on the theoretical foundations of AI safety. We’ll be talking about his view of the field, as well as the work he’s doing at OpenAI. For links to what we’re discussing, you can just check the description of this episode and you can read the transcript at axrp.net. Scott, welcome to AXRP.

Scott Aaronson: Thank you. Good to be here.

‘Reform’ AI alignment

Epistemology of AI risk

Daniel Filan: So you recently wrote this blog post about something you called reform AI alignment: basically your take on AI alignment that’s somewhat different from what you see as a traditional view or something. Can you tell me a little bit about, do you see AI causing or being involved in a really important way in existential risk anytime soon, and if so, how?

Scott Aaronson: Well, I guess it depends what you mean by soon. I am not a very good prognosticator. I feel like even in quantum computing theory, which is this tiny little part of the intellectual world where I’ve spent 25 years of my life, I can’t predict very well what’s going to be discovered a few years from now in that, and if I can’t even do that, then how much less can I predict what impacts AI is going to have on human civilization over the next century? Of course, I can try to play the Bayesian game, and I even will occasionally accept bets if I feel really strongly about something, but I’m also kind of a wuss.

I’m a little bit risk-averse, and I like to tell people whenever they ask me ‘how soon will AI take over the world?’, or before that, it was more often, ‘how soon will we have a fault-tolerant quantum computer?’. They don’t want all the considerations and explanations that I can offer, they just want a number, and I like to tell them, “Look, if I were good at that kind of thing, I wouldn’t be a professor, would I? I would be an investor and I would be a multi-billionaire.” So I feel like probably, there are some people in the world who can just consistently see what is coming in decades and get it right. There are hedge funds that are consistently successful (not many), but I feel like the way that science has made progress for hundreds of years has not been to try to prognosticate the whole shape of the future.

It’s been to look a little bit ahead, look at the problems that we can see right now that could actually be solved, and rather than predicting 10 steps ahead the future, you just try to create the next step ahead of the future and try to steer it in what looks like a good direction, and I feel like that is what I try to do as a scientist.

And I’ve known the rationalist community, the AI risk community since. maybe no...

...more