The Nonlinear Library

AF - AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson by DanielFilan


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson, published by DanielFilan on April 12, 2023 on The AI Alignment Forum.
Google Podcasts link
How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.
Topics we discuss:
‘Reform’ AI alignment
Epistemology of AI risk
Immediate problems and existential risk
Aligning deceitful AI
Stories of AI doom
Language models
Democratic governance of AI
What would change Scott’s mind
Watermarking language model outputs
Watermark key secrecy and backdoor insertion
Scott’s transition to AI research
Theoretical computer science and AI alignment
AI alignment and formalizing philosophy
How Scott finds AI research
Following Scott’s research
Daniel Filan: Hello, everyone. In this episode, I’ll be speaking with Scott Aaronson. Scott is a professor of computer science at UT Austin and he’s currently spending a year as a visiting scientist at OpenAI working on the theoretical foundations of AI safety. We’ll be talking about his view of the field, as well as the work he’s doing at OpenAI. For links to what we’re discussing, you can just check the description of this episode and you can read the transcript at axrp.net. Scott, welcome to AXRP.
Scott Aaronson: Thank you. Good to be here.
‘Reform’ AI alignment
Epistemology of AI risk
Daniel Filan: So you recently wrote this blog post about something you called reform AI alignment: basically your take on AI alignment that’s somewhat different from what you see as a traditional view or something. Can you tell me a little bit about, do you see AI causing or being involved in a really important way in existential risk anytime soon, and if so, how?
Scott Aaronson: Well, I guess it depends what you mean by soon. I am not a very good prognosticator. I feel like even in quantum computing theory, which is this tiny little part of the intellectual world where I’ve spent 25 years of my life, I can’t predict very well what’s going to be discovered a few years from now in that, and if I can’t even do that, then how much less can I predict what impacts AI is going to have on human civilization over the next century? Of course, I can try to play the Bayesian game, and I even will occasionally accept bets if I feel really strongly about something, but I’m also kind of a wuss.
I’m a little bit risk-averse, and I like to tell people whenever they ask me ‘how soon will AI take over the world?’, or before that, it was more often, ‘how soon will we have a fault-tolerant quantum computer?’. They don’t want all the considerations and explanations that I can offer, they just want a number, and I like to tell them, “Look, if I were good at that kind of thing, I wouldn’t be a professor, would I? I would be an investor and I would be a multi-billionaire.” So I feel like probably, there are some people in the world who can just consistently see what is coming in decades and get it right. There are hedge funds that are consistently successful (not many), but I feel like the way that science has made progress for hundreds of years has not been to try to prognosticate the whole shape of the future.
It’s been to look a little bit ahead, look at the problems that we can see right now that could actually be solved, and rather than predicting 10 steps ahead the future, you just try to create the next step ahead of the future and try to steer it in what looks like a good direction, and I feel like that is what I try to do as a scientist.
And I’ve known the rationalist community, the AI risk community since. maybe no...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings