August 05, 2025

“Narrow finetuning is different” by cloud, Stewy Slocum

6 minutes

Epistemic status: an informal note.

It is common to use finetuning on a narrow data distribution, or narrow finetuning (NFT), to study AI safety. In these experiments, a model is trained on a very specific type of data, then evaluated for broader properties, such as a capability or general disposition.

Ways that narrow finetuning is different

Narrow finetuning is different than the training procedures that frontier AI companies use, like pretraining on the internet, or posttraining on a diverse mixture of data and tasks. Here are some ways it is different:

Underspecification of broader behavior - training a model on a narrow data distribution means that most of the model's behavior (behavior outside the training distribution) is not incorporated in the loss. This means that all sorts of undesired, degenerate, or unusual behavior can arise that would normally be prevented by the loss function (e.g., as in emergent [...]

---

Outline:

(00:31) Ways that narrow finetuning is different

(02:08) Anecdote

(03:05) Examples

(03:37) Counterpoints

(04:54) Takeaways

The original text contained 1 footnote which was omitted from this narration.

---

First published:

August 5th, 2025

Source:

https://www.lesswrong.com/posts/7emjxGADozzm7uwKL/narrow-finetuning-is-different

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong