
Sign up to save your podcasts
Or


Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning requires rethinking generalization.
Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1]
Caption: believe it or not, this unassuming table rocked the field of deep learning theory back in 2016, despite probably involving fewer computational resources than what Claude 4.7 Opus consumed when I clicked the “Claude” button embedded into the LessWrong editor.
—
Let's start by answering a question: what, exactly, do I mean by deep learning theory?
At least in 2016, the answer was: “extending statistical learning theory to deep neural networks trained with SGD, in order to derive generalization bounds that would explain their behavior in practice”.
—
Since its conception in the mid 1980s, statistical learning theory had been the dominant approach for [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongAround 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning requires rethinking generalization.
Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1]
Caption: believe it or not, this unassuming table rocked the field of deep learning theory back in 2016, despite probably involving fewer computational resources than what Claude 4.7 Opus consumed when I clicked the “Claude” button embedded into the LessWrong editor.
—
Let's start by answering a question: what, exactly, do I mean by deep learning theory?
At least in 2016, the answer was: “extending statistical learning theory to deep neural networks trained with SGD, in order to derive generalization bounds that would explain their behavior in practice”.
—
Since its conception in the mid 1980s, statistical learning theory had been the dominant approach for [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

112,347 Listeners

130 Listeners

7,244 Listeners

560 Listeners

16,327 Listeners

4 Listeners

14 Listeners

2 Listeners