
Sign up to save your podcasts
Or


There is a stark difference between rehearsing classical AI risk 101 arguments about instrumental convergence, and tackling problems like "Design and implement the exact evaluations we'll run on GPT-5 to determine whether it's capable enough that we should worry about it acting aligned until it can execute a takeover".
And naturally, since I've started working on problems like the one above, I've noticed a large shift in my thinking on AI. I describe it as thinking about risks in near-mode, as opposed to far-mode.
In this post, I share a few concrete examples about my experiences with this change-of-orientation.
1. Prerequisites for scheming
Continuing with the example from the intro: A year ago I was confident about the "the AI is just playing along with our training and evaluations, until it is in a position where it can take over" threat model (deceptive alignment / scheming) basically [...]
---
Outline:
(00:41) 1. Prerequisites for scheming
(03:20) 2. A failed prediction
(05:37) 3. Mundane superhuman capabilities
(07:42) 4. Automating alignment research
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThere is a stark difference between rehearsing classical AI risk 101 arguments about instrumental convergence, and tackling problems like "Design and implement the exact evaluations we'll run on GPT-5 to determine whether it's capable enough that we should worry about it acting aligned until it can execute a takeover".
And naturally, since I've started working on problems like the one above, I've noticed a large shift in my thinking on AI. I describe it as thinking about risks in near-mode, as opposed to far-mode.
In this post, I share a few concrete examples about my experiences with this change-of-orientation.
1. Prerequisites for scheming
Continuing with the example from the intro: A year ago I was confident about the "the AI is just playing along with our training and evaluations, until it is in a position where it can take over" threat model (deceptive alignment / scheming) basically [...]
---
Outline:
(00:41) 1. Prerequisites for scheming
(03:20) 2. A failed prediction
(05:37) 3. Mundane superhuman capabilities
(07:42) 4. Automating alignment research
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners