Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Response to Dileep George: AGI safety warrants planning ahead, published by Steve Byrnes on July 8, 2024 on The AI Alignment Forum.
(Target audience: Dileep George himself, and anyone coming from a similar place.)
Dileep George is a researcher working at the intersection of AI and neuroscience. He started his career by co-founding Numenta in 2005 with Jeff Hawkins (while a Stanford PhD student), then he left to co-found Vicarious in 2010 with D. Scott Phoenix, and moved to DeepMind in 2022 when DeepMind acquired Vicarious.
Dileep was recently interviewed by Daniel Faggella on his "The Trajectory" podcast: YouTube, Apple podcasts, X/Twitter.
It's a fun interview that touched on many topics, most of which I'll ignore, in favor of one very important action-relevant disagreement between Dileep and myself.
…And this is the point where everyone these days seems to assume that there are only two possible reasons that anyone would ever bring up the topic of Artificial General Intelligence (AGI) safety in conversation:
The person is advocating for government regulation of large ML training runs
…or the person is advocating against government regulation of large ML training runs.
But, no! That's not my disagreement! That's not why I'm writing this post!! Quite the contrary, I join Dileep in being basically unenthusiastic about governmental regulation of large ML training runs right now.
Instead, this post is advocating for Differential Intellectual Progress within technical AI research of the type that Dileep is doing - and more specifically, I'm advocating in favor of figuring out a technical approach to sculpting AGI motivations in docile and/or prosocial directions (a.k.a. "solving the technical alignment problem") before figuring out the exact data structures and parameter-updating rules that would constitute an AGI's ability to build and query a powerful world-model.
The first half of this post (§1-2) will try to explain what I'm talking about, what it would entail, and why I think it's critically important. The second half of this post (§3) is more specifically my pessimistic response to Dileep's suggestion that, as AGI is gradually developed in the future, people will be able to react and adapt to problems as they arise.
I really think Dileep is a brilliant guy with the best of intentions (e.g. he's a signatory on the Asilomar AI Principles). I just think there are some issues that he hasn't spent much time thinking through. I hope that this post will help.
Post outline:
Section 1 lists some areas of agreement and disagreement between Dileep and me. In particular, we have a giant area of agreement in terms of how we expect future AGI algorithms to work. Our massive common ground here is really why I'm bothering to write this post at all - it makes me hopeful that Dileep & I can have a productive exchange, and not just talk past each other.
Section 2 argues that, for the kind of AGI that Dileep is trying to build, there's an unsolved technical alignment problem: How do we set up this kind of AGI with the motivation to behave in a docile and/or prosocial way?
Section 3 is my pessimistic push-back on Dileep's optimistic hope that, if AGI is developed gradually, then we can regulate or adapt to problems as they arise:
Section 3.1 lists some big obvious societal problems that have been around for a long time, but nevertheless remain unsolved, along with generic discussions of some underlying challenges that have prevented them from being solved, and why those challenges may apply to AGI too.
Section 3.2 dives more specifically into the question of whether we can "keep strong AI as a tool, not a successor", as Dileep hopes. I think it sounds nice but will be impossible to pull off.
Section 3.3 comments that, even if we could react and adapt to AGI given enough time - an assum...