
Sign up to save your podcasts
Or
The UK's AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about our agenda.
Summary: The AISI Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course of action which could lead to egregious harm, and which are not under human control. No known technical mitigations are reliable past AGI.
Our plan is to break down promising alignment agendas by developing safety case sketches. We'll use these sketches to identify specific holes and gaps in current approaches. We expect that many of these gaps can be formulated as well-defined subproblems within existing fields (e.g., theoretical computer science). By identifying researchers with relevant expertise who aren't currently working on alignment and funding their efforts on these subproblems, we hope to substantially increase parallel progress on alignment.
[...]
---
Outline:
(01:41) 1. Why safety case-oriented alignment research?
(03:33) 2. Our initial focus: honesty and asymptotic guarantees
(07:07) Example: Debate safety case sketch
(08:58) 3. Future work
(09:02) Concrete open problems in honesty
(12:13) More details on our empirical approach
(14:23) Moving beyond honesty: automated alignment
(15:36) 4. List of open problems we'd like to see solved
(15:53) 4.1 Empirical problems
(17:57) 4.2 Theoretical problems
(21:23) Collaborate with us
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
The UK's AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about our agenda.
Summary: The AISI Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course of action which could lead to egregious harm, and which are not under human control. No known technical mitigations are reliable past AGI.
Our plan is to break down promising alignment agendas by developing safety case sketches. We'll use these sketches to identify specific holes and gaps in current approaches. We expect that many of these gaps can be formulated as well-defined subproblems within existing fields (e.g., theoretical computer science). By identifying researchers with relevant expertise who aren't currently working on alignment and funding their efforts on these subproblems, we hope to substantially increase parallel progress on alignment.
[...]
---
Outline:
(01:41) 1. Why safety case-oriented alignment research?
(03:33) 2. Our initial focus: honesty and asymptotic guarantees
(07:07) Example: Debate safety case sketch
(08:58) 3. Future work
(09:02) Concrete open problems in honesty
(12:13) More details on our empirical approach
(14:23) Moving beyond honesty: automated alignment
(15:36) 4. List of open problems we'd like to see solved
(15:53) 4.1 Empirical problems
(17:57) 4.2 Theoretical problems
(21:23) Collaborate with us
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,367 Listeners
2,397 Listeners
7,779 Listeners
4,103 Listeners
87 Listeners
1,442 Listeners
8,778 Listeners
89 Listeners
355 Listeners
5,370 Listeners
15,053 Listeners
460 Listeners
126 Listeners
64 Listeners
432 Listeners