
Sign up to save your podcasts
Or


Epistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]
---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongEpistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]
---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,234 Listeners

131 Listeners

7,230 Listeners

562 Listeners

16,230 Listeners

4 Listeners

14 Listeners

2 Listeners