
Sign up to save your podcasts
Or


Audio note: this article contains 288 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
TL;DR: This post derives an upper bound on the generalization error for Bayesian learning on neural networks. Unlike the bound from vanilla Singular Learning Theory (SLT), this bound also holds for out-of-distribution generalization, not just for in-distribution generalization. Along the way, it shows some connections between SLT and Algorithmic Information Theory (AIT).
Written at Goodfire AI.
Introduction
Singular Learning Theory (SLT) describes Bayesian learning on neural networks. But it currently has some limitations. One of these limitations is that it assumes model training data are drawn independently and identically (IID) from some distribution, making it difficult to use SLT to describe out-of-distribution (OOD) generalization. If we train a model to classify pictures of animals taken outdoors, vanilla SLT [...]
---
Outline:
(00:52) Introduction
(04:05) Prediction error bounds for a computationally bounded Solomonoff induction
(04:11) Claim 1: We can import Solomonoff induction into the learning-theoretic setting
(06:50) Claim
(08:16) High-level proof summary
(09:34) Claim 2: A bounded induction still efficiently predicts efficiently predictable data}
(09:42) Setup
(10:54) Claim
(13:04) Claim 3: The bounded induction is still somewhat invariant under our choice of UTM
(15:12) Prediction error bound for Bayesian learning on neural networks
(15:17) Claim 4: We can obtain a similar generalization bound for Bayesian learning on neural networks
(15:32) Setup
(16:59) Claim
(18:36) High-level proof summary
(19:14) Comments
(20:21) Relating the volume to SLT quantities
(23:10) Open problems and questions
(23:28) How do the priors actually relate to each other?
(24:08) Conjecture 1
(24:58) Conjecture 2 (Likely false for arbitrary NN architectures)
(27:20) What does _C(w^\*,\\epsilon,f)_ look like in practice?
(28:24) Acknowledgments
The original text contained 16 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrong
Audio note: this article contains 288 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
TL;DR: This post derives an upper bound on the generalization error for Bayesian learning on neural networks. Unlike the bound from vanilla Singular Learning Theory (SLT), this bound also holds for out-of-distribution generalization, not just for in-distribution generalization. Along the way, it shows some connections between SLT and Algorithmic Information Theory (AIT).
Written at Goodfire AI.
Introduction
Singular Learning Theory (SLT) describes Bayesian learning on neural networks. But it currently has some limitations. One of these limitations is that it assumes model training data are drawn independently and identically (IID) from some distribution, making it difficult to use SLT to describe out-of-distribution (OOD) generalization. If we train a model to classify pictures of animals taken outdoors, vanilla SLT [...]
---
Outline:
(00:52) Introduction
(04:05) Prediction error bounds for a computationally bounded Solomonoff induction
(04:11) Claim 1: We can import Solomonoff induction into the learning-theoretic setting
(06:50) Claim
(08:16) High-level proof summary
(09:34) Claim 2: A bounded induction still efficiently predicts efficiently predictable data}
(09:42) Setup
(10:54) Claim
(13:04) Claim 3: The bounded induction is still somewhat invariant under our choice of UTM
(15:12) Prediction error bound for Bayesian learning on neural networks
(15:17) Claim 4: We can obtain a similar generalization bound for Bayesian learning on neural networks
(15:32) Setup
(16:59) Claim
(18:36) High-level proof summary
(19:14) Comments
(20:21) Relating the volume to SLT quantities
(23:10) Open problems and questions
(23:28) How do the priors actually relate to each other?
(24:08) Conjecture 1
(24:58) Conjecture 2 (Likely false for arbitrary NN architectures)
(27:20) What does _C(w^\*,\\epsilon,f)_ look like in practice?
(28:24) Acknowledgments
The original text contained 16 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

26,388 Listeners

2,424 Listeners

8,267 Listeners

4,145 Listeners

92 Listeners

1,580 Listeners

9,828 Listeners

89 Listeners

488 Listeners

5,475 Listeners

16,083 Listeners

534 Listeners

133 Listeners

96 Listeners

509 Listeners