
Sign up to save your podcasts
Or


This was too long to be a short-form, but it should really be a short-form.
This notice is useful for people who've recently got into AI safety, who want to engage with the ancient texts (i.e. pre-2024). If you were around before 2023, then you probably don't need this.
A few phrases have changed their meaning over time. Two examples that came to mind recently are scheming and mech interp. (In both cases, I think the change-of-terminology was reasonable.) There are probably a bunch of other examples — feel free to mention them in the comments.
Scheming.
This used to mean "training-gaming in pursuit of out-of-context goals". For example, Carlsmith (Nov 2023) starts with:
This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment".
Then Apollo came out with Frontier Models are Capable of In-context Scheming" (Dec 2024):
We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow.
So the difference here is (1) the AI is isn't in training (it's in [...]
---
Outline:
(00:47) Scheming.
(02:12) Mech interp.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongThis was too long to be a short-form, but it should really be a short-form.
This notice is useful for people who've recently got into AI safety, who want to engage with the ancient texts (i.e. pre-2024). If you were around before 2023, then you probably don't need this.
A few phrases have changed their meaning over time. Two examples that came to mind recently are scheming and mech interp. (In both cases, I think the change-of-terminology was reasonable.) There are probably a bunch of other examples — feel free to mention them in the comments.
Scheming.
This used to mean "training-gaming in pursuit of out-of-context goals". For example, Carlsmith (Nov 2023) starts with:
This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment".
Then Apollo came out with Frontier Models are Capable of In-context Scheming" (Dec 2024):
We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow.
So the difference here is (1) the AI is isn't in training (it's in [...]
---
Outline:
(00:47) Scheming.
(02:12) Mech interp.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,101 Listeners

130 Listeners

7,232 Listeners

576 Listeners

16,129 Listeners

4 Listeners

14 Listeners

2 Listeners