The Nonlinear Library: Alignment Forum

AF - Forecasting future gains due to post-training enhancements by elifland


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Forecasting future gains due to post-training enhancements, published by elifland on March 8, 2024 on The AI Alignment Forum.
This work has been done in the context of SaferAI's work on risk assessment. Equal contribution by Eli and Joel. I'm sharing this writeup in the form of a Google Doc and reproducing the summary below.
Disclaimer: this writeup is context for upcoming experiments, not complete work. As such it contains a lot of (not always well-justified) guess-work and untidy conceptual choices. We are publishing now despite this to get feedback.
If you are interested in this work - perhaps as a future collaborator or funder, or because this work could provide helpful input into e.g. risk assessments or RSPs - please get in touch with us at
Summary
A recent report documented how the performance of AI models can be improved after training, via
post-training enhancements (PTEs) such as external tools, scaffolding, and fine-tuning. The gain from a PTE is measured in compute-equivalent gains (CEG): the multiplier on training compute required to achieve equivalent performance to a model combined with a PTE.
We are interested in understanding the contribution that PTEs make to AI system capabilities over time.
This question in turn is motivated by SaferAI's work on quantitative risk assessments of frontier models. In particular, any risk assessment of open-sourcing models or of having closed-source models stolen or leaked should take into account system capabilities, which we might expect to increase over time as PTEs are added to the system built on top of a given base model.
We extend a
recent analysis of PTEs in order to understand the trend in CEG over time.
There are serious limitations in our preliminary analysis, including:
problems with the CEG metric, many
uninformed parameter estimates, and reliance on an ill-defined
"average task".
High-priority
future work includes running experiments to get more evidence on important uncertainties for our forecasts of capability gains due to PTEs. In particular, we think it will be important to understand how well different PTEs combine, as well as to directly study performance on benchmarks relevant to dangerous capabilities rather than relying on the CEG and average task abstractions.
In this write-up, we will:
Outline our methodology. (
More.)
Present CEG estimates for various PTEs. (
More.)
Aggregate total CEG, using subjective estimates of 'composability.' (
More.)
Note limitations of our analysis and important future work. (
More.)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners