
Sign up to save your podcasts
Or


Executive Summary
---
Outline:
(00:06) Executive Summary
(01:25) Introduction
(03:23) Background and methodology
(06:30) Results
(06:33) Base models refuse harmful requests
(08:21) Eliciting more base model refusals with steering vectors
(11:00) Bypassing refusal in base models
(12:22) Investigating (pre-ChatGPT model) LLaMA 1 7B
(16:32) Related work
(17:44) Conclusion
(19:50) Citing this work
(20:04) Author contributions statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongExecutive Summary
---
Outline:
(00:06) Executive Summary
(01:25) Introduction
(03:23) Background and methodology
(06:30) Results
(06:33) Base models refuse harmful requests
(08:21) Eliciting more base model refusals with steering vectors
(11:00) Bypassing refusal in base models
(12:22) Investigating (pre-ChatGPT model) LLaMA 1 7B
(16:32) Related work
(17:44) Conclusion
(19:50) Citing this work
(20:04) Author contributions statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

113,026 Listeners

132 Listeners

7,266 Listeners

560 Listeners

16,495 Listeners

4 Listeners

14 Listeners

2 Listeners