November 08, 2025

“Omniscaling to MNIST” by cloud

20 minutes

In this post, I describe a mindset that is flawed, and yet helpful for choosing impactful technical AI safety research projects.

The mindset is this: future AI might look very different than AI today, but good ideas are universal. If you want to develop a method that will scale up to powerful future AI systems, your method should also scale down to MNIST. In other words, good ideas omniscale: they work well across all model sizes, domains, and training regimes.

The Modified National Institute of Standards and Technology database (MNIST): 70,000 images of handwritten digits, 28x28 pixels each (source: Wikipedia). You can fit the whole dataset and many models on a single GPU!

Putting the omniscaling mindset into practice is straightforward. Any time you come across a clever-sounding machine learning idea, ask: "can I apply this to MNIST?" If not, then it's not a good idea. If so, run an experiment to see if it works. If it doesn't, then it's not a good idea. If it does, then it might be a good idea, and you can continue as usual to more realistic experiments or theory.

In this post, I will:

Share how MNIST experiments have informed my [...]

---

Outline:

(01:58) Applications to MNIST

(02:42) Gradient routing

(04:43) Distillation robustifies unlearning

(08:39) Subliminal learning

(10:37) Why you should do it on MNIST

(11:30) MNIST is not sufficient (and other tips)

(14:25) The omniscaling assumption is false

(17:09) Code and more ideas

(18:40) Closing thoughts

The original text contained 7 footnotes which were omitted from this narration.

---

First published:

November 8th, 2025

Source:

https://www.lesswrong.com/posts/4aeshNuEKF8Ak356D/omniscaling-to-mnist

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

November 08, 2025

“Omniscaling to MNIST” by cloud

20 minutes

In this post, I describe a mindset that is flawed, and yet helpful for choosing impactful technical AI safety research projects.

In this post, I will:

Share how MNIST experiments have informed my [...]

---

Outline:

(01:58) Applications to MNIST

(02:42) Gradient routing

(04:43) Distillation robustifies unlearning

(08:39) Subliminal learning

(10:37) Why you should do it on MNIST

(11:30) MNIST is not sufficient (and other tips)

(14:25) The omniscaling assumption is false

(17:09) Code and more ideas

(18:40) Closing thoughts

The original text contained 7 footnotes which were omitted from this narration.

---

First published:

November 8th, 2025

Source:

https://www.lesswrong.com/posts/4aeshNuEKF8Ak356D/omniscaling-to-mnist

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,217 Listeners

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat

7,243 Listeners

Dwarkesh Podcast

558 Listeners

The Ezra Klein Show

16,290 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “Omniscaling to MNIST” by cloud

Sign up to save your podcasts

“Omniscaling to MNIST” by cloud

“Omniscaling to MNIST” by cloud

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi