
Sign up to save your podcasts
Or


We've been using Synthetic Document Finetuning (SDF) quite a bit at Apollo Research lately. This post covers a few tweaks to the standard SDF recipe specific to our use cases, plus some general tips and tricks for getting good results. We’re sharing these notes in case they’re useful to others doing research with SDF.
1. What Is SDF?
Synthetic Document Finetuning (SDF) is a knowledge editing technique where models are finetuned on LLM-generated documents consistent with a target fact or belief. As described in Slocum et al. (2025), SDF "often succeeds at implanting beliefs that behave similarly to genuine knowledge." These implanted beliefs can generalize to related contexts, are often robust to scrutiny, and form internal representations similar to genuine knowledge.
We mostly followed the pipeline described in Slocum et al. (2025) and the safety-research/false-facts repository.
The pipeline has several stages:
---
Outline:
(00:32) 1. What Is SDF?
(02:03) Iterating on Universes and Generation Prompts
(03:42) 2. Getting Models to Surface the Information
(04:14) Dropping the DOCTAG
(04:56) Dropping Webtext to Increase Saliency
(06:13) Matching the Test Distribution
(07:11) Prepending Eval Prompts
(08:20) 3. Training Details
(08:24) Document Length and Token Counts
(09:05) Training for Multiple Epochs
(09:37) Running Experiments with LoRA & Tinker
(10:33) 4. Dealing with Gibberish
(12:14) 5. Evaluating Effects
(13:58) 6. Other Explorations
(17:06) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongWe've been using Synthetic Document Finetuning (SDF) quite a bit at Apollo Research lately. This post covers a few tweaks to the standard SDF recipe specific to our use cases, plus some general tips and tricks for getting good results. We’re sharing these notes in case they’re useful to others doing research with SDF.
1. What Is SDF?
Synthetic Document Finetuning (SDF) is a knowledge editing technique where models are finetuned on LLM-generated documents consistent with a target fact or belief. As described in Slocum et al. (2025), SDF "often succeeds at implanting beliefs that behave similarly to genuine knowledge." These implanted beliefs can generalize to related contexts, are often robust to scrutiny, and form internal representations similar to genuine knowledge.
We mostly followed the pipeline described in Slocum et al. (2025) and the safety-research/false-facts repository.
The pipeline has several stages:
---
Outline:
(00:32) 1. What Is SDF?
(02:03) Iterating on Universes and Generation Prompts
(03:42) 2. Getting Models to Surface the Information
(04:14) Dropping the DOCTAG
(04:56) Dropping Webtext to Increase Saliency
(06:13) Matching the Test Distribution
(07:11) Prepending Eval Prompts
(08:20) 3. Training Details
(08:24) Document Length and Token Counts
(09:05) Training for Multiple Epochs
(09:37) Running Experiments with LoRA & Tinker
(10:33) 4. Dealing with Gibberish
(12:14) 5. Evaluating Effects
(13:58) 6. Other Explorations
(17:06) Acknowledgements
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,330 Listeners

130 Listeners

7,247 Listeners

563 Listeners

16,328 Listeners

4 Listeners

14 Listeners

2 Listeners