Share AF - Penalize Model Complexity Via Self-Distillation by research prime space

Copy link

April 04, 2023

AF - Penalize Model Complexity Via Self-Distillation by research prime space

1 minute

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Penalize Model Complexity Via Self-Distillation, published by research prime space on April 4, 2023 on The AI Alignment Forum.

When you self-distill a model (e.g. train a new model using predictions from your old model), the resulting model represents a less complex function. After many rounds of self-distillation, you essentially end up with a constant function. This paper makes the above more precise.

Anyway, if you apply multiple rounds of self-distillation to a model, it becomes less complex. So if the original model learned complex, power-seeking behaviors that doesn't help it do well on the training data, this behavior would likely go away after several rounds of self-distillation. Self-distillation allows you to essentially get the minimum complexity model that still does well on the test set. Thus, I think it's promising from an AI safety standpoint.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more