February 11, 2025

“Knocking Down My AI Optimist Strawman” by tailcalled

11 minutes

I recently posted my model of an optimistic view of AI, asserting that I disagree with every sentence of it. I thought I might as well also describe my objections to those sentences:

"The rapid progress spearheaded by OpenAI is clearly leading to artificial intelligence that will soon surpass humanity in every way."

Here's some of the main things humanity might want to achieve:

Curing aging and other diseases
Plentiful clean energy from e.g. nuclear fusion
De-escalating nuclear MAD while extending world peace and human freedom
... even if hostile nations would use powerful unaligned AI[1] to fight you
Stopping criminals, even if they would make powerful unaligned AI[1] to fight you
Educating people to be great and patriotic
Creating healthy, tasty food without torturing animals
Nice homes for humans near important things
Good, open channels for honest, valuable communication
Common knowledge of the virtues and vices of executives, professionals [...]

---

Outline:

(00:16) The rapid progress spearheaded by OpenAI is clearly leading to artificial intelligence that will soon surpass humanity in every way.

(01:28) People used to be worried about existential risk from misalignment, yet we have a good idea about what influence current AIs are having on the world, and it is basically going fine.

(03:59) The root problem is that The Sequences expected AGI to develop agency largely without human help; meanwhile actual AI progress occurs by optimizing the scaling efficiency of a pretraining process that is mostly focus on integrating the AI with human culture.

(05:24) This means we will be able to control AI by just asking it to do good things, showing it some examples and giving it some ranked feedback.

(06:15) You might think this is changing with inference-time scaling, yet if the alignment would fall apart as new methods get taken into use, wed have seen signs of it with o1.

(06:48) In the unlikely case that our current safety will turn out to be insufficient, interpretability research has worked out lots of deeply promising ways to improve, with sparse autoencoders letting us read the minds of the neural networks and thereby screen them for malice, and activation steering letting us deeply control the networks to our hearts content.

(07:32) AI x-risk worries arent just a waste of time, though; they are dangerous because they make people think society needs to make use of violence to regulate what kinds of AIs people can make and how they can use them.

(07:58) This danger was visible from the very beginning, as alignment theorists thought one could (and should) make a singleton that would achieve absolute power (by violently threatening humanity, no doubt), rather than always letting AIs be pure servants of humanity.

(10:15) To justify such violence, theorists make up all sorts of elaborate unfalsifiable and unjustifiable stories about how AIs are going to deceive and eventually kill humanity, yet the initial deceptions by base models were toothless, and thanks to modern alignment methods, serious hostility or deception has been thoroughly stamped out.

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

February 8th, 2025

Source:

https://www.lesswrong.com/posts/TCmj9Wdp5vwsaHAas/knocking-down-my-ai-optimist-strawman

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong