July 02, 2023

AF - VC Theory Overview by Joar Skalse

14 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: VC Theory Overview, published by Joar Skalse on July 2, 2023 on The AI Alignment Forum.

In this post, I will give a brief overview of VC theory and computational learning theory. The reason for this is that I will later write a post about singular learning theory, in which I will refer back to this post. However, this post is self-contained, and may be interesting for anyone interested in theoretical machine learning.

What is Computational Learning Theory?

In short, computational learning theory (CLT) is what happens when you combine complexity theory with machine learning. Just like complexity theory is concerned with understanding what kinds of problems are hard to compute, or easy to compute, CLT is concerned with understanding what kinds of things are easy to learn, or hard to learn. This means that CLT seeks to identify classes of problems that you can learn with a small amount of data, and classes of problems that no algorithm can learn unless it gets a large amount of data. This also means that CLT produces generalisation bounds for different kinds of learning algorithms, and other such results.

Probably Approximately Correct Learning

In order to study what kinds of problems are hard to learn, and easy to learn, we first need a mathematical formalism that captures what it means to "learn a problem". The most popular model in CLT is given by the probably approximately correct (PAC) learning framework. This is a model of supervised learning for classification. It uses the following components:

We have an instance space X, which contains all the data that we might observe. For example, X may be a vector space, or it may be the set of all bit strings, etc. If we are doing image classification, then X may be the set of all RGB images of a certain size, and so on.

We have a set of concepts C, where every element of C is a function c : X -> {0,1}. This is the set of functions that we might be trying to learn. For example, if X is a vector space, then C could be the set of all linear classifiers on X. Alternatively, if X is {0,1}^n, then C could be the set of all Boolean functions over n variables with circuit complexity =< m. And so on.

We assume that we have a (potentially unknown) distribution D over X, and that there is an unknown function c : X -> {0,1} that we are trying to learn. To do this, our learning algorithm will observe a number of data points (x,c(x)), where each x is sampled iid from D. Based on this dataset, it will guess a function h : X -> {0,1}. We want this function to be similar to c with high probability.

Specifically, given a distribution D over X, a concept c \in C, and a hypothesis h : X -> {0,1}, we say that the error of h is P(h(x) != c(x)), where x is sampled from D.

A learning algorithm L is an algorithm that takes as input a dataset {(x, c(x))} of points, and returns a function h : X -> {0,1}.

We say that L is a PAC-learning algorithm for C if there exists a polynomial p, such that for all ϵ and δ, and all distributions D over X and all concepts c in C, if L is given access to at least p(1/ϵ,1/δ) samples from D, together with their labels according to c, then L will with probability at least 1−δ learn a function whose error is at most ϵ. Moreover, if there exists a PAC-learning algorithm for C, then we say that C is PAC-learnable.

In other words, PAC-learning is a mathematical formalism that describes supervised learning of classification algorithms. C is a set of classification functions, that we may be trying to learn. We want to know if there exists a supervised learning algorithm L, such that if L is given access to a small (polynomial) amount of training data, then it will with high probability learn a function with low error. We want this bound to hold for all c in C, and all D over X, so that we can be sure that this bound holds e...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

July 02, 2023

AF - VC Theory Overview by Joar Skalse

14 minutes

What is Computational Learning Theory?

Probably Approximately Correct Learning

Specifically, given a distribution D over X, a concept c \in C, and a hypothesis h : X -> {0,1}, we say that the error of h is P(h(x) != c(x)), where x is sampled from D.

A learning algorithm L is an algorithm that takes as input a dataset {(x, c(x))} of points, and returns a function h : X -> {0,1}.

...more

Share AF - VC Theory Overview by Joar Skalse

Sign up to save your podcasts

AF - VC Theory Overview by Joar Skalse

AF - VC Theory Overview by Joar Skalse