July 07, 2024

“A ‘Bitter Lesson’ Approach to Aligning AGI and ASI” by RogerDearnaley

45 minutes

TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role.

Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is. However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should [...]

---

Outline:

(01:19) Why The Alignment Problem is Hard (In My Opinion)

(11:41) A Bitter Lesson-Motivated Approach to Alignment

(18:34) Adding Minimal Necessary Complexity

(30:28) Could This Work?

(36:24) How Expensive Would Doing This Be?

(41:53) What Next if This Works?

The original text contained 3 footnotes which were omitted from this narration.

---

First published:

July 6th, 2024