
Sign up to save your podcasts
Or


TLDR
This is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday).
There's a set of exercises (each one contains about 1-2 days of material) on the following topics:
---
Outline:
(00:13) TLDR
(01:49) New material
(01:52) en-US-AvaMultilingualNeural__ Diagram showing eight AI safety and interpretability concepts including linear probes and activation oracles.
(03:19) (1.3.1) Linear Probes
(04:06) (1.3.4) Activation Oracles
(04:58) (1.4.2) Attribution Graphs
(06:15) (4.1) Emergent Misalignment
(07:05) (4.2) Science of Misalignment
(08:00) (4.3) Reasoning Model Interpretability
(08:52) (4.4) LLM Psychology & Persona Vectors
(09:51) (4.5) Investigator Agents
(10:45) New Site Features
(12:07) Logistics
(12:57) Why use, in vibe-code world?
(15:12) Feedback
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongTLDR
This is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday).
There's a set of exercises (each one contains about 1-2 days of material) on the following topics:
---
Outline:
(00:13) TLDR
(01:49) New material
(01:52) en-US-AvaMultilingualNeural__ Diagram showing eight AI safety and interpretability concepts including linear probes and activation oracles.
(03:19) (1.3.1) Linear Probes
(04:06) (1.3.4) Activation Oracles
(04:58) (1.4.2) Attribution Graphs
(06:15) (4.1) Emergent Misalignment
(07:05) (4.2) Science of Misalignment
(08:00) (4.3) Reasoning Model Interpretability
(08:52) (4.4) LLM Psychology & Persona Vectors
(09:51) (4.5) Investigator Agents
(10:45) New Site Features
(12:07) Logistics
(12:57) Why use, in vibe-code world?
(15:12) Feedback
The original text contained 1 footnote which was omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

112,326 Listeners

130 Listeners

7,242 Listeners

559 Listeners

16,321 Listeners

4 Listeners

14 Listeners

2 Listeners