Summary and Table of ContentsThe goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular,
- Section 1 talks about “autonomous learning”, and the related human ability to discern whether ideas hang together and make sense, and how and if that applies to current and future AIs.
- Section 2 presents the case that “capabilities generalize farther than alignment”, by analogy with the evolution of humans.
- Section 3 argues that the analogy between AGI and the evolution of humans is not a great analogy. Instead, I offer a new and (I claim) better analogy between AGI training and, umm, a weird fictional story that has a lot to do with the [...]
---
Outline:(00:06) Summary and Table of Contents
(03:15) 1. Background: Autonomous learning
(03:21) 1.1 Intro
(08:48) 1.2 More on discernment in human math
(11:11) 1.3 Three ingredients to progress: (1) generation, (2) selection, (3) open-ended accumulation
(14:04) 1.4 Judgment via experiment, versus judgment via discernment
(18:23) 1.5 Where do foundation models fit in?
(20:35) 2. The sense in which capabilities generalize further than alignment
(20:42) 2.1 Quotes
(24:20) 2.2 In terms of the (1-3) triad
(26:38) 3. Definitely-not-evolution-I-swear Provides Evidence for the Sharp Left Turn
(26:45) 3.1 Evolution per se isn't the tightest analogy we have to AGI
(28:20) 3.2 The story of Ev
(31:41) 3.3 Ways that Ev would have been surprised by exactly how modern humans turned out
(34:21) 3.4 The arc of progress is long, but it bends towards wireheading
(37:03) 3.5 How does Ev feel, overall?
(41:18) 3.6 Spelling out the analogy
(41:42) 3.7 Just how sharp is this left turn?
(45:13) 3.8 Objection: In this story, Ev is pretty stupid. Many of those surprises were in fact readily predictable! Future AGI programmers can do better.
(46:19) 3.9 Objection: We have tools at our disposal that Ev above was not using, like better sandbox testing, interpretability, corrigibility, and supervision
(48:17) 4. The sense in which alignment generalizes further than capabilities
(49:34) 5. Contrasting the two sides
(50:25) 5.1 Three ways to feel optimistic, and why I'm somewhat skeptical of each
(50:33) 5.1.1 The argument that humans will stay abreast of the (1-3) loop, possibly because they're part of it
(52:34) 5.1.2 The argument that, even if an AI is autonomously running a (1-3) loop, that will not undermine obedient (or helpful, or harmless, or whatever) motivation
(57:18) 5.1.3 The argument that we can and will do better than Ev
(59:27) 5.2 A fourth, cop-out option
The original text contained 3 footnotes which were omitted from this narration. ---
First published: January 28th, 2025
Source: https://www.lesswrong.com/posts/2yLyT6kB7BQvTfEuZ/sharp-left-turn-discourse-an-opinionated-review
---
Narrated by