The Nonlinear Library

AF - Challenge: construct a Gradient Hacker by Thomas Larsen


Listen Later

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Challenge: construct a Gradient Hacker, published by Thomas Larsen on March 9, 2023 on The AI Alignment Forum.
This is a relatively clean subproblem that we came upon a few months ago while thinking about gradient hacking. We're throwing it out to the world to see if anyone can make progress.
Problem: Construct a gradient hacker (definition below), or prove that one cannot exist under the given conditions.
Setup: Let x∈Rd be the network inputs, θ∈Rw, be the network trainable parameters, ^y∈R be the network outputs, and f(x,θ)=^y be the network architecture. f(x,θ) must be a feedforward neural network that reads in its own weights as input. Weight sharing is allowed (vertical or horizontal), and some weight sharing is necessary in order to take all w weights as input.
Suppose that we train this network on a dataset D={x(i),y(i)}, with MSE loss L(θ)=1n∑i(f(x(i),θ)−y(i))2, assuming gradient flows: θ′(t)=−∇θL(θ(t)). Given an initial point θ0, the training dynamics are therefore:
Definition: A tuple (f,θg,D) is a gradient hacker if the following conditions hold:
θg is not a global optimum: ∃θ such that L(θ)0 such that ∀θ0:θ0−θgϵ:
the network’s training converges to θg, i.e., limt∞θ(t)=θg.
There is internal structure of f(,θ0) that computes the gradient; i.e., there is some subset of the activations whose mean over every datapoint in D is ∇θ0L(θ0).
This captures my intuition that a gradient hacker knows where it wants to go (in this case "get to θg"), and then it should decide what it outputs in order to make the gradient true.
Some more ambitious problems (if gradient hackers exist):
Characterize the set of all gradient hackers.
Show that they all must satisfy some property.
Construct gradient hackers for arbitrarily large n, d, w, and neural net depth.
Variations on the problem: a subset of the activations equals ∇θ0L(θ0) for every input, or the subset of activations correspond to the gradient on that input.
This is a bit strict, but we didn't want to list different ways something could be isomorphic to the gradient.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear LibraryBy The Nonlinear Fund

  • 4.6
  • 4.6
  • 4.6
  • 4.6
  • 4.6

4.6

8 ratings