April 07, 2023

AF - Beren's "Deconfusing Direct vs Amortised Optimisation" by Cinera Verinia

6 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Beren's "Deconfusing Direct vs Amortised Optimisation", published by Cinera Verinia on April 7, 2023 on The AI Alignment Forum.

Preamble

I heavily recommend @beren's "Deconfusing Direct vs Amortised Optimisation". It's a very important conceptual clarification that has changed how I think about many issues bearing on technical AI safety.

Currently, it's the most important blog post I've read this year.

This sequence (if I get around to completing it) is an attempt to draw more attention to Beren's conceptual frame and its implications for how to think about issues of alignment and agency.

This first post presents a distillation of the concept, and subsequent posts explore its implications.

Two Approaches to Optimisation

Beren introduces a taxonomy categorising intelligent systems according to the kind of optimisation they are performing. I think it's more helpful to think of these as two ends of a spectrum as opposed to distinct discrete categories; sophisticated real world intelligent systems (e.g. humans) appear to be a hybrid of the two approaches.

Direct Optimisers

Systems that perform inference by directly choosing actions to optimise some objective function

Responses are computed on the fly and individually for each input

Direct optimisers perform inference by answering the question: "what action maximises or minimises this objective function ([discounted] cumulative reward and loss respectively)?"

Examples: AIXI, MCTS, model-based reinforcement learning, other "planning" systems

Naively, direct optimisers can be understood as computing (an approximation of) argmax (or argmin) for a suitable objective function during inference.

Amortised Optimisers

Systems that learn to approximate a function during training and perform inference by evaluating the output of the learned function on their inputs.

The function approximator is learned from a dataset of input data and successful solutions

Amortised optimisation converts an inference problem to a supervised learning problem

It's called "amortised optimisation" because while learning the policy is expensive, the cost of inference is amortised over all evaluations of the learned policy

Amortised optimisers can be seen as performing inference by answering the question "what output (e.g. action, probability distribution over tokens) does this learned function (policy, predictive model) return for this input (agent state, prompt)?"

Examples: model free reinforcement learning, LLMs, most supervised & self supervised(?) learning systems

Naively, amortised optimisers can be understood as evaluating a (fixed) learned function; they're not directly computing argmax (or argmin) for any particular objective function during inference.

Differences

AspectDirect OptimizationAmortized OptimizationProblem SolvingComputes optimal responses "on the fly"Evaluates the learned function approximator on the given inputComputational ApproachSearches through a solution spaceLearns a function approximatorRuntime CostHigher, as it requires in-depth search for a suitable solutionLower, as it only needs a forward pass through the function approximatorScalability with ComputeScales by expanding search depthScales by better approximating the posterior distributionConvergenceIn the limit of arbitrary compute, the system's policy converges to argmaxargmin of the appropriate objective functionIn the limit of arbitrary compute, the system's policy converges to the best description of the training datasetPerformanceMore favourable in "simple" domainsMore favourable in "rich" domains Data EfficiencyLittle data needed for high performance (e.g. an MCTS agent can attain strongly superhuman performance in Chess/Go given only the rules and sufficient

compute)Requires (much) more data for high performance (e.g. an amortised agent necessarily needs to observe m...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

April 07, 2023

AF - Beren's "Deconfusing Direct vs Amortised Optimisation" by Cinera Verinia

6 minutes

Preamble

Currently, it's the most important blog post I've read this year.

This sequence (if I get around to completing it) is an attempt to draw more attention to Beren's conceptual frame and its implications for how to think about issues of alignment and agency.

This first post presents a distillation of the concept, and subsequent posts explore its implications.

Two Approaches to Optimisation

Direct Optimisers

Systems that perform inference by directly choosing actions to optimise some objective function

Responses are computed on the fly and individually for each input

Direct optimisers perform inference by answering the question: "what action maximises or minimises this objective function ([discounted] cumulative reward and loss respectively)?"

Examples: AIXI, MCTS, model-based reinforcement learning, other "planning" systems

Naively, direct optimisers can be understood as computing (an approximation of) argmax (or argmin) for a suitable objective function during inference.

Amortised Optimisers

Systems that learn to approximate a function during training and perform inference by evaluating the output of the learned function on their inputs.

The function approximator is learned from a dataset of input data and successful solutions

Amortised optimisation converts an inference problem to a supervised learning problem

It's called "amortised optimisation" because while learning the policy is expensive, the cost of inference is amortised over all evaluations of the learned policy

Examples: model free reinforcement learning, LLMs, most supervised & self supervised(?) learning systems

Naively, amortised optimisers can be understood as evaluating a (fixed) learned function; they're not directly computing argmax (or argmin) for any particular objective function during inference.

Differences

compute)Requires (much) more data for high performance (e.g. an amortised agent necessarily needs to observe m...

...more

Share AF - Beren's "Deconfusing Direct vs Amortised Optimisation" by Cinera Verinia

Sign up to save your podcasts

AF - Beren's "Deconfusing Direct vs Amortised Optimisation" by Cinera Verinia

AF - Beren's "Deconfusing Direct vs Amortised Optimisation" by Cinera Verinia