The Nonlinear Library

LW - Distilling Singular Learning Theory by Liam Carroll


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Distilling Singular Learning Theory, published by Liam Carroll on June 16, 2023 on LessWrong.
TLDR; In this sequence I distill Sumio Watanabe's Singular Learning Theory (SLT) by explaining the essence of its main theorem - Watanabe's Free Energy Formula for Singular Models - and illustrating its implications with intuition-building examples. I then show why neural networks are singular models, and demonstrate how SLT provides a framework for understanding phases and phase transitions in neural networks. Finally, I will outline a research agenda for applying the insights of SLT to AI alignment.
Epistemic status: The core theorems of Singular Learning Theory have been rigorously proven and published by Sumio Watanabe across 20 years of research. Precisely what it says about modern deep learning, and its potential application to alignment, is still speculative.
Acknowledgements: This sequence has been produced with the support of a grant from the Long Term Future Fund. I'd like to thank all of the people that have given me feedback on each post: Ben Gerraty, Jesse Hoogland, Matthew Farrugia-Roberts, Luke Thorburn, Rumi Salazar, Guillaume Corlouer, and in particular my supervisor and editor-in-chief Daniel Murfet.
Theory vs Examples: The sequence is a mixture of synthesising the main theoretical results of SLT, and providing simple examples and animations that illustrate its key points. As such, some theory-based sections are slightly more technical. Some readers may wish to skip ahead to the intuitive examples and animations before diving into the theory - these are clearly marked in the table of contents of each post.
Prerequisites: Anybody with a basic grasp of Bayesian statistics and multivariable calculus should have no problems understanding the key points. Importantly, despite SLT pointing out the relationship between algebraic geometry and statistical learning, no prior knowledge of algebraic geometry is required to understand this sequence - I will merely gesture at this relationship. Jesse Hoogland wrote an excellent introduction to SLT which serves as a high level overview of the ideas that I will discuss here, and is thus recommended pre-reading to this sequence.
SLT Workshop: I have prepared the sequence with the Workshop on Singular Learning Theory and Alignment in mind (original announcement here). For those attending the virtual Primer from June 19th-24th 2023, this work serves as a useful companion piece. If you haven't signed up yet and find this sequence interesting, consider attending!
Thesis: The sequence is derived from my recent masters thesis which you can read about at my website.
Introduction
Knowledge to be discovered [in a statistical model] corresponds to a singularity.
If a statistical model is devised so that it extracts hidden structure from a random phenomenon, then it naturally becomes singular.
Sumio Watanabe
In 2009, Sumio Watanabe wrote these two profound statements in his groundbreaking book Algebraic Geometry and Statistical Learning where he proved the first main results of Singular Learning Theory (SLT). Up to this point, this work has gone largely under-appreciated by the AI community, probably because it is rooted in highly technical algebraic geometry and distribution theory. On top of this, the theory is framed in the Bayesian setting, which contrasts the SGD-based setting of modern deep learning.
But this is a crying shame, because SLT has a lot to say about why neural networks, which are singular models, are able to generalise well in the Bayesian setting, and it is very possible that these insights carry over to modern deep learning.
At its core, SLT shows that the loss landscape of singular models, the KL divergence K(w), is fundamentally different to that of regular models like linear regression, consisting of flat valley...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings