November 02, 2023

LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery

2 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Estimating effective dimensionality of MNIST models, published by Arjun Panickssery on November 2, 2023 on LessWrong.

The

local learning coefficent

is a measure of a model's "effective dimensionality" that captures its complexity (more background

here

Lau et al

recently described a sampling method (SGLD) using noisy gradients to find a stochastic estimate

in a computationally tractable way (good explanation

here

I present results (

Github repo

) of the "Task Variability" project suggested on the

DevInterp project list

. To see how degeneracy scales with task difficulty and model class, I trained a fully-connected MLP and a CNN (both with ~120k parameters) on nine MNIST variants with different subsets of the labels (just the labels {0, 1}, then {0, 1, 2}, etc.). All models were trained to convergence using the same number of training data points. I implement Lau et al's algorithm on each of the trained models. The results below are averaged over three runs:

The results for the full MNIST dataset are comparable to Lau et al's results while using models ten times smaller trained for ten times fewer epochs.

The sampling method is finicky and sensitive to the hyperparameter choices of learning rate and noise factor. It will fail by producing negative or very high

values if the noise

and the distance penalty

(see lines 6 and 7 in the pseudocode above) isn't calibrated to the model's level of convergence. The results show a linear scaling law relating number of labels to task complexity. The CNN typically has a lower

than the MLP, which matches intuitions that some of the complexity is "stored" in the architecture because the convolutions apply a useful prior on functions good at solving image recognition tasks.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

November 02, 2023

LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery

2 minutes

The

local learning coefficent

is a measure of a model's "effective dimensionality" that captures its complexity (more background

here

Lau et al

recently described a sampling method (SGLD) using noisy gradients to find a stochastic estimate

in a computationally tractable way (good explanation

here

I present results (

Github repo

) of the "Task Variability" project suggested on the

DevInterp project list

The results for the full MNIST dataset are comparable to Lau et al's results while using models ten times smaller trained for ten times fewer epochs.

The sampling method is finicky and sensitive to the hyperparameter choices of learning rate and noise factor. It will fail by producing negative or very high

values if the noise

and the distance penalty

than the MLP, which matches intuitions that some of the complexity is "stored" in the architecture because the convolutions apply a useful prior on functions good at solving image recognition tasks.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

...more

Share LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery

Sign up to save your podcasts

LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery

LW - Estimating effective dimensionality of MNIST models by Arjun Panickssery