
Sign up to save your podcasts
Or


Epistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]
---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongEpistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]
---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

26,311 Listeners

2,461 Listeners

8,597 Listeners

4,170 Listeners

97 Listeners

1,608 Listeners

10,041 Listeners

97 Listeners

528 Listeners

5,529 Listeners

16,055 Listeners

570 Listeners

138 Listeners

93 Listeners

473 Listeners