
Sign up to save your podcasts
Or
Summary and Table of Contents
The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular,
---
Outline:
(00:06) Summary and Table of Contents
(03:15) 1. Background: Autonomous learning
(03:21) 1.1 Intro
(08:48) 1.2 More on discernment in human math
(11:11) 1.3 Three ingredients to progress: (1) generation, (2) selection, (3) open-ended accumulation
(14:04) 1.4 Judgment via experiment, versus judgment via discernment
(18:23) 1.5 Where do foundation models fit in?
(20:35) 2. The sense in which capabilities generalize further than alignment
(20:42) 2.1 Quotes
(24:20) 2.2 In terms of the (1-3) triad
(26:38) 3. Definitely-not-evolution-I-swear Provides Evidence for the Sharp Left Turn
(26:45) 3.1 Evolution per se isn't the tightest analogy we have to AGI
(28:20) 3.2 The story of Ev
(31:41) 3.3 Ways that Ev would have been surprised by exactly how modern humans turned out
(34:21) 3.4 The arc of progress is long, but it bends towards wireheading
(37:03) 3.5 How does Ev feel, overall?
(41:18) 3.6 Spelling out the analogy
(41:42) 3.7 Just how sharp is this left turn?
(45:13) 3.8 Objection: In this story, Ev is pretty stupid. Many of those surprises were in fact readily predictable! Future AGI programmers can do better.
(46:19) 3.9 Objection: We have tools at our disposal that Ev above was not using, like better sandbox testing, interpretability, corrigibility, and supervision
(48:17) 4. The sense in which alignment generalizes further than capabilities
(49:34) 5. Contrasting the two sides
(50:25) 5.1 Three ways to feel optimistic, and why I'm somewhat skeptical of each
(50:33) 5.1.1 The argument that humans will stay abreast of the (1-3) loop, possibly because they're part of it
(52:34) 5.1.2 The argument that, even if an AI is autonomously running a (1-3) loop, that will not undermine obedient (or helpful, or harmless, or whatever) motivation
(57:18) 5.1.3 The argument that we can and will do better than Ev
(59:27) 5.2 A fourth, cop-out option
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Summary and Table of Contents
The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular,
---
Outline:
(00:06) Summary and Table of Contents
(03:15) 1. Background: Autonomous learning
(03:21) 1.1 Intro
(08:48) 1.2 More on discernment in human math
(11:11) 1.3 Three ingredients to progress: (1) generation, (2) selection, (3) open-ended accumulation
(14:04) 1.4 Judgment via experiment, versus judgment via discernment
(18:23) 1.5 Where do foundation models fit in?
(20:35) 2. The sense in which capabilities generalize further than alignment
(20:42) 2.1 Quotes
(24:20) 2.2 In terms of the (1-3) triad
(26:38) 3. Definitely-not-evolution-I-swear Provides Evidence for the Sharp Left Turn
(26:45) 3.1 Evolution per se isn't the tightest analogy we have to AGI
(28:20) 3.2 The story of Ev
(31:41) 3.3 Ways that Ev would have been surprised by exactly how modern humans turned out
(34:21) 3.4 The arc of progress is long, but it bends towards wireheading
(37:03) 3.5 How does Ev feel, overall?
(41:18) 3.6 Spelling out the analogy
(41:42) 3.7 Just how sharp is this left turn?
(45:13) 3.8 Objection: In this story, Ev is pretty stupid. Many of those surprises were in fact readily predictable! Future AGI programmers can do better.
(46:19) 3.9 Objection: We have tools at our disposal that Ev above was not using, like better sandbox testing, interpretability, corrigibility, and supervision
(48:17) 4. The sense in which alignment generalizes further than capabilities
(49:34) 5. Contrasting the two sides
(50:25) 5.1 Three ways to feel optimistic, and why I'm somewhat skeptical of each
(50:33) 5.1.1 The argument that humans will stay abreast of the (1-3) loop, possibly because they're part of it
(52:34) 5.1.2 The argument that, even if an AI is autonomously running a (1-3) loop, that will not undermine obedient (or helpful, or harmless, or whatever) motivation
(57:18) 5.1.3 The argument that we can and will do better than Ev
(59:27) 5.2 A fourth, cop-out option
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,331 Listeners
2,403 Listeners
7,873 Listeners
4,105 Listeners
87 Listeners
1,449 Listeners
8,765 Listeners
90 Listeners
350 Listeners
5,370 Listeners
14,993 Listeners
468 Listeners
128 Listeners
72 Listeners
438 Listeners