August 05, 2024

“Near-mode thinking on AI” by Olli Järviniemi

9 minutes

There is a stark difference between rehearsing classical AI risk 101 arguments about instrumental convergence, and tackling problems like "Design and implement the exact evaluations we'll run on GPT-5 to determine whether it's capable enough that we should worry about it acting aligned until it can execute a takeover".

And naturally, since I've started working on problems like the one above, I've noticed a large shift in my thinking on AI. I describe it as thinking about risks in near-mode, as opposed to far-mode.

In this post, I share a few concrete examples about my experiences with this change-of-orientation.

1. Prerequisites for scheming

Continuing with the example from the intro: A year ago I was confident about the "the AI is just playing along with our training and evaluations, until it is in a position where it can take over" threat model (deceptive alignment / scheming) basically [...]

---

Outline:

(00:41) 1. Prerequisites for scheming

(03:20) 2. A failed prediction

(05:37) 3. Mundane superhuman capabilities

(07:42) 4. Automating alignment research

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

August 4th, 2024

Source:

https://www.lesswrong.com/posts/ASLHfy92vCwduvBRZ/near-mode-thinking-on-ai

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong