LessWrong (30+ Karma)

“Understanding and Controlling LLM Generalization” by Daniel Tan


Listen Later

A distillation of my long-term research agenda and current thinking. I welcome takes on this.

Why study generalization? 

I'm interested in studying how LLMs generalise - when presented with multiple policies that achieve similar loss, which ones tend to be learned by default?

I claim this is pretty important for AI safety:

  • Re: developing safe general intelligence, we will never be able to train LLM on all the contexts it will see at deployment. To prevent goal misgeneralization, it's necessary to understand how LLMs generalise their training OOD.
  • Re: loss of control risks specifically, certain important kinds of misalignment (reward hacking, scheming) are difficult to 'select against' at the behavioural level. A fallback for this would be if LLMs had an innate 'generalization propensity' to learn aligned policies over misaligned ones. 

This motivates research into LLM inductive biases. Or as I'll call them from here on, 'generalization propensities'.

I have two high-level goals:

  1. Understanding the complete set of causal factors that drive generalization.
  2. Controlling generalization by intervening on these causal factors in a principled way. 

Defining "generalization propensity" 

To study generalization propensities, we need two things:

  1. "Generalization propensity evaluations" (GPEs)
  2. [...]

---

Outline:

(00:18) Why study generalization?

(01:30) Defining generalization propensity

(02:29) Research questions

---

First published:

November 14th, 2025

Source:

https://www.lesswrong.com/posts/ZSQaT2yxNNZ3eLxRd/understanding-and-controlling-llm-generalization

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,956 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,290 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

548 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,362 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners