
Sign up to save your podcasts
Or


Audio note: this article contains 76 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Refer to the arXiv preprint for full content. This post is a lighter, 15-minute version.
Abstract
---
Outline:
(00:25) Abstract
(01:26) Introduction
(04:35) Background
(06:55) Conditional Activation Steering
(12:57) Conditioning Refusal: Selectively Steering on Harmful Prompts
(15:40) Programming Refusal: Logical Composition of Refusal Condition
(22:52) Discussion
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrong
Audio note: this article contains 76 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Refer to the arXiv preprint for full content. This post is a lighter, 15-minute version.
Abstract
---
Outline:
(00:25) Abstract
(01:26) Introduction
(04:35) Background
(06:55) Conditional Activation Steering
(12:57) Conditioning Refusal: Selectively Steering on Harmful Prompts
(15:40) Programming Refusal: Logical Composition of Refusal Condition
(22:52) Discussion
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

112,856 Listeners

130 Listeners

7,217 Listeners

531 Listeners

16,202 Listeners

4 Listeners

14 Listeners

2 Listeners