The Nonlinear Library: Alignment Forum

AF - Causality and a Cost Semantics for Neural Networks by scottviteri


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Causality and a Cost Semantics for Neural Networks, published by scottviteri on August 21, 2023 on The AI Alignment Forum.
Epistemic status: I time-boxed this idea to three days of effort. So any calculations are pretty sloppy, and I haven't looked into any related works. I probably could have done much better if I knew anything about circuit complexity. There are some TODOs and an unfinished last section -- if you are interested in this content and want to pick up where I have left off I'll gladly add you as a collaborator to this post.
Here is a "tech tree" for neural networks. I conjecture (based on admittedly few experiments) that the simplest implementation of any node in this tree includes an implementation of its parents, given that we are writing programs starting from the primitives +, , and relu. An especially surprising relationship (to me) is that "if statements" are best implemented downstream of division.
Introduction
While discussing with my friend Anthony Corso, an intriguing idea arose. Maybe we can define whether program p1 "causes" p2 in the following way: Given a neural network that mimics p1, how easy is it to learn a neural network which mimics the behavior of p2? This proposition is intriguing because it frames causality as a question about two arbitrary programs, and reduces it to a problem of program complexity.
Suppose that p1 and p2 are written in a programming language P, and let P(ops) represent P extended with ops as primitive operations. We define a complexity function C:P(ops)R, which takes a program in the extended language and returns a real number representative of the program's complexity for some fixed notion of complexity. Let's define the degree to which p1 "causes" p2 as the minimum complexity achievable by a program p from P(p1) such that p is extensionally equal (equal for all inputs) to p2. If P2 is the set of all p in P(obs+p1) that are extensionally equal to p2, then causes(p1,p2)=minp∈P2C(p). We can also use this definition in the approximate case, considering the minimum complexity achievable by programs p such that E(p(x)-p2(x))2<ε with respect to some L1-integrable probability measure.
We can define a particular complexity function C that represents the cost of executing a program. We can estimate this quantity by looking at the program's Abstract Syntax Tree (AST) in relation to some cost model of the primitive operations in the language. For this exploration, we have chosen the lambda calculus as the language. Lambda calculus is a minimalist Lisp-like language with just a single type, which in our case we will think of as floating point numbers. The notation is simple: lambda abstraction is represented as λ x. x, and function application as (f g), which is not the same as f(g) in most other languages.
How I Would Like People to Engage with this Work
By writing Ops in your favorite programming language
By circumventing my proposed tech tree, by reaching a child without reaching a parent and using fewer (or equal) number of operations
By training some neural networks between these programs, and seeing how difficult it is to learn one program after pre-training on another
Cost Semantics
Definition
We define the cost of operations and expressions in the following manner:
Ops op=1,for any operation op in opsOps c=0,for any floating-point constant cOps x=0,for any variable xOps (λx.e)=Ops eOps (f g)=Ops f+Ops g
For operations of higher arity, we have({Ops }({op }x1.xn))=({Ops }{op})+∑i({Ops }xi)
The selected operations for a neural network are ops = {+, , relu}.
Basic Operations and Warm-Up
Let's take a few examples to demonstrate this cost calculus:
To derive subtraction, we first create negation neg.
(Ops neg) = (Ops (λ x. ( -1 x))) = (Ops ( -1 x))= (Ops ) + (Ops -1) + (Ops x) = 1 + 0 + 0 = 1
The cost of subtraction (-) ...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners