January 24, 2023

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

22 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Parameter Scaling Comes for RL, Maybe, published by 1a3orn on January 24, 2023 on LessWrong.

TLDR

Unlike language models or image classifiers, past reinforcement learning models did not reliably get better as they got bigger. Two DeepMind RL papers published in January 2023 nevertheless show that with the right techniques, scaling up RL model parameters can increase both total reward and sample-efficiency of RL agents -- and by a lot. Return-to-scale has been key for rendering language models powerful and economically valuable; it might also be key for RL, although many important questions remain unanswered.

Intro

Reinforcement learning models often have very few parameters compared to language and image models.

The Vision Transformer has 2 billion parameters. GPT-3 has 175 billion. The slimmer Chinchilla, trained in accord with scaling laws emphasizing bigger datasets, has 70 billion.

By contrast, until a month ago, the largest mostly-RL models I knew of were the agents for Starcraft and Dota2, AlphaStar and OpenAI5, which had 139 million and 158 million parameters. And most RL models are far smaller, coming in well under 50 million parameters.

The reason RL hasn't scaled up the size of its models is simple -- doing so generally hasn't made them better.

Increasing model size in RL can even hurt performance. MuZero Reanalyze gets worse on some tasks as you scale network size. So does a vanilla SAC agent.

There has been good evidence for scaling model size in somewhat... non-central examples of RL. For instance, offline RL agents trained from expert examples, such as DeepMind's 1.2-billion parameter Gato or Multi-Game Decision Transformers, clearly get better with scale. Similarly, RL from human feedback on language models generally shows that larger LM's are better. Hybrid systems such as PaLM SayCan benefit from larger language models. But all these cases sidestep problems central to RL -- they have no need to balance exploration and exploitation in seeking reward.

In the typical RL setting -- there has generally been little scaling and little evidence for the efficacy of scaling. (Although there has not been no evidence.)

None of the above means that the compute spent on RL models is small or that compute scaling does nothing for them. AlphaStar used only a little less compute than GPT-3, and AlphaGo Zero used more, because both of them trained on an enormous number of games. Additional compute predictably improves performance of RL agents. But, rather than getting a bigger brain, almost all RL algorithms spend this compute by (1) training on an enormous number of games (2) or (if concerned with sample-efficiency) by revisiting the games that they've played an enormous number of times.

So for a while RL has lacked:

(1) The ability to scale up model size to reliably improve performance.

(2) (Even supposing the above were around) Any theory like the language-model scaling laws which would let you figure out how to allocate compute between model size / longer training.

My intuition is that the lack of (1), and to a lesser degree the lack of (2), is evidence that no one has stumbled on the "right way" to do RL or RL-like problems. It's like language modeling when it only had LSTMS and no Transformers, before the frighteningly straight lines in log-log charts appeared.

In the last month, though, two RL papers came out with interesting scaling charts, each showing strong gains to parameter scaling. Both were (somewhat unsurprisingly) from DeepMind. This is the kind of thing that leads me to think "Huh, this might be an important link in the chain that brings about AGI."

The first paper is "Mastering Diverse Domains Through World Models", which names its agent DreamerV3. The second is "Human-Timescale Adaptation in an Open-Ended Task Space", which names its agent Adaptive...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

January 24, 2023

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

22 minutes

TLDR

Intro

Reinforcement learning models often have very few parameters compared to language and image models.

The Vision Transformer has 2 billion parameters. GPT-3 has 175 billion. The slimmer Chinchilla, trained in accord with scaling laws emphasizing bigger datasets, has 70 billion.

The reason RL hasn't scaled up the size of its models is simple -- doing so generally hasn't made them better.

Increasing model size in RL can even hurt performance. MuZero Reanalyze gets worse on some tasks as you scale network size. So does a vanilla SAC agent.

In the typical RL setting -- there has generally been little scaling and little evidence for the efficacy of scaling. (Although there has not been no evidence.)

So for a while RL has lacked:

(1) The ability to scale up model size to reliably improve performance.

(2) (Even supposing the above were around) Any theory like the language-model scaling laws which would let you figure out how to allocate compute between model size / longer training.

...more

Share LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

Sign up to save your podcasts

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn

LW - Parameter Scaling Comes for RL, Maybe by 1a3orn