
Sign up to save your podcasts
Or


Ed and Anna are co-first authors on this work.
TL;DREmergent Misalignment found that fine-tuning models on narrowly misaligned data, such as insecure code [...]
---
Outline:
(00:16) TL;DR
(01:19) Introduction
(03:25) Coherent Emergent Misalignment
(07:02) EM with 0.5B Parameters
(08:11) EM with a Full Supervised Finetune
(09:13) EM with a Single Rank 1 LoRA Adapter
(10:01) Future Work
(11:05) Contributions
(11:33) Acknowledgments
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongEd and Anna are co-first authors on this work.
TL;DREmergent Misalignment found that fine-tuning models on narrowly misaligned data, such as insecure code [...]
---
Outline:
(00:16) TL;DR
(01:19) Introduction
(03:25) Coherent Emergent Misalignment
(07:02) EM with 0.5B Parameters
(08:11) EM with a Full Supervised Finetune
(09:13) EM with a Single Rank 1 LoRA Adapter
(10:01) Future Work
(11:05) Contributions
(11:33) Acknowledgments
The original text contained 6 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,361 Listeners

2,428 Listeners

8,957 Listeners

4,145 Listeners

92 Listeners

1,591 Listeners

9,911 Listeners

90 Listeners

72 Listeners

5,471 Listeners

16,083 Listeners

537 Listeners

131 Listeners

94 Listeners

511 Listeners