
Sign up to save your podcasts
Or


Executive Summary
---
Outline:
(00:06) Executive Summary
(01:25) Introduction
(03:23) Background and methodology
(06:30) Results
(06:33) Base models refuse harmful requests
(08:21) Eliciting more base model refusals with steering vectors
(11:00) Bypassing refusal in base models
(12:22) Investigating (pre-ChatGPT model) LLaMA 1 7B
(16:32) Related work
(17:44) Conclusion
(19:50) Citing this work
(20:04) Author contributions statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongExecutive Summary
---
Outline:
(00:06) Executive Summary
(01:25) Introduction
(03:23) Background and methodology
(06:30) Results
(06:33) Base models refuse harmful requests
(08:21) Eliciting more base model refusals with steering vectors
(11:00) Bypassing refusal in base models
(12:22) Investigating (pre-ChatGPT model) LLaMA 1 7B
(16:32) Related work
(17:44) Conclusion
(19:50) Citing this work
(20:04) Author contributions statement
The original text contained 3 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,333 Listeners

2,456 Listeners

8,482 Listeners

4,175 Listeners

95 Listeners

1,609 Listeners

9,951 Listeners

96 Listeners

515 Listeners

5,509 Listeners

15,835 Listeners

554 Listeners

131 Listeners

91 Listeners

471 Listeners