
Sign up to save your podcasts
Or
Audio note: this article contains 76 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Refer to the arXiv preprint for full content. This post is a lighter, 15-minute version.
Abstract
---
Outline:
(00:25) Abstract
(01:26) Introduction
(04:35) Background
(06:55) Conditional Activation Steering
(12:57) Conditioning Refusal: Selectively Steering on Harmful Prompts
(15:40) Programming Refusal: Logical Composition of Refusal Condition
(22:52) Discussion
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Audio note: this article contains 76 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
Refer to the arXiv preprint for full content. This post is a lighter, 15-minute version.
Abstract
---
Outline:
(00:25) Abstract
(01:26) Introduction
(04:35) Background
(06:55) Conditional Activation Steering
(12:57) Conditioning Refusal: Selectively Steering on Harmful Prompts
(15:40) Programming Refusal: Logical Composition of Refusal Condition
(22:52) Discussion
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,401 Listeners
2,388 Listeners
7,925 Listeners
4,132 Listeners
87 Listeners
1,456 Listeners
9,045 Listeners
86 Listeners
388 Listeners
5,427 Listeners
15,207 Listeners
474 Listeners
123 Listeners
75 Listeners
455 Listeners