The Nonlinear Library

AF - On Developing a Mathematical Theory of Interpretability by Spencer Becker-Kahn


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Developing a Mathematical Theory of Interpretability, published by Spencer Becker-Kahn on February 9, 2023 on The AI Alignment Forum.
If the trajectory of the deep learning paradigm continues, it seems plausible to me that in order for applications of low-level interpretability to AI not-kill-everyone-ism to be truly reliable, we will need a much better-developed and more general theoretical and mathematical framework for deep learning than currently exists. And this sort of work seems difficult. Doing mathematics carefully - in particular finding correct, rigorous statements and then finding correct proofs of those statements - is slow. So slow that the rate of change of cutting-edge engineering practices significantly worsens the difficulties involved in building theory at the right level of generality. And, in my opinion, much slower than the rate at which we can generate informal observations that might possibly be worthy of further mathematical investigation.
Thus it can feel like the role that serious mathematics has to play in interpretability is primarily reactive, i.e. consists mostly of activities like 'adding' rigour after the fact or building narrow models to explain specific already-observed phenomena.
My impression however, is that the best applied mathematics doesn’t tend to work like this. My impression is that although the use of mathematics in a given field may initially be reactive and disunited, one of the most lauded aspects of mathematics is a certain inevitability with which our abstractions take on a life of their own and reward us later with insight, generalization, and the provision of predictions. Moreover - remarkably - often those abstractions are found in relatively mysterious, intuitive ways: i.e. not as the result of us just directly asking "What kind of thing seems most useful for understanding this object and making predictions?" but, at least in part, as a result of aesthetic judgement and a sense of mathematical taste.
One consequence of this (which is a downside and also probably partly due to the inherent limitations of human mathematics) is that mathematics does not tend to act as an objective tool that you can bring to bear on whatever question it is that you want to think about. Instead, the very practice of doing mathematics seeks out the questions that mathematics is best placed to answer. It cannot be used to say something useful about just anything; rather it finds out what it is that it can say something about.
Even after taking into account these limitations and reservations, developing something that I'm clumsily thinking of as 'the mathematics of (the interpretability of) deep learning-based AI' might still be a fruitful endeavour. In case it is not clear, this is roughly speaking, because a) Many people are putting a lot of hope and resources into low-level interpretability; b) Its biggest hurdles will be making it 'work' at large scale, on large models, quickly and reliably; and c) - the sentiment I opened this article with - doing this latter thing might well require much more sophisticated general theory.
In thinking about some of these themes, I started to mull over a couple of illustrative analogies or examples. The first - and more substantive example - is algebraic topology. This area of mathematics concerns itself with certain ways of assigning mathematical (specifically algebraic) information to shapes and spaces. Many of its foundational ideas have beautiful informal intuitions behind them, such as the notion that a shape my have enough space in it to contain a sphere, but not enough space to contain the ball that that sphere might have demarcated. Developing these informal notions into rigorous mathematics was a long and difficult process and learning this material - even now when it is presented in its ...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings