The Nonlinear Library: Alignment Forum

AF - Clarifying METR's Auditing Role by Beth Barnes


Listen Later

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Clarifying METR's Auditing Role, published by Beth Barnes on May 30, 2024 on The AI Alignment Forum.
Although METR has never claimed to have audited anything or to be providing meaningful oversight or accountability, there has been some confusion about whether METR is an auditor or planning to be one.
To clarify this point:
1. METR's top priority is to develop the science of evaluations, and we don't need to be auditors in order to succeed at this.
We aim to build evaluation protocols that can be used by evaluators/auditors regardless of whether that is the government, an internal lab team, another third party, or a team at METR.
2. We should not be considered to have 'audited' GPT-4 or Claude.
Those were informal pilots of what an audit might involve, or research collaborations - not providing meaningful oversight. For example, it was all under NDA - we didn't have any right or responsibility to disclose our findings to anyone outside the labs - and there wasn't any formal expectation it would inform deployment decisions. We also didn't have the access necessary to perform a proper evaluation. In the OpenAI case, as is noted in their system card:
"We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts … We provided them with early access to multiple versions of the GPT-4 model, but they did not have the ability to fine-tune it. They also did not have access to the final version of the model that we deployed.
The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we've observed. … fine-tuning for task-specific behavior could lead to a difference in performance.
As a next step, ARC will need to conduct experiments that (a) involve the final version of the deployed model (b) involve ARC doing its own fine-tuning, before a reliable judgment of the risky emergent capabilities of GPT-4-launch can be made".
3. We are and have been in conversation with frontier AI companies about whether they would like to work with us in a third-party evaluator capacity, with various options for how this could work.
As it says on our website:
"We have previously worked with
Anthropic,
OpenAI, and other companies to pilot some informal pre-deployment evaluation procedures. These companies have also given us some kinds of non-public access and provided compute credits to support evaluation research.
We think it's important for there to be third-party evaluators with formal arrangements and access commitments - both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations.
We do not yet have such arrangements, but we are excited about taking more steps in this direction."
4. We are interested in conducting third-party evaluations and may hire & fundraise to do so, but would also be happy to see other actors enter the space. Whether we expand our capacity here depends on many factors such as:
Whether governments mandate access/this kind of relationship.
Whether governments want to work with third parties vs conduct audits in-house.
Whether frontier AI companies are keen to work with us in this capacity, giving us the necessary access to do so.
How successful we are in hiring the talent we need to do this without detracting from our top priority of developing the science.
How successful governments or other third-party evaluators are at performing evaluation protocols sufficiently well.
Technical considerations of what kind of expertise is required for doing good elicitation.
Etc.
If you're interested in helping METR conduct third-party evaluations in-house and/or support government or other auditors t...
...more
View all episodesView all episodes
Download on the App Store

The Nonlinear Library: Alignment ForumBy The Nonlinear Fund


More shows like The Nonlinear Library: Alignment Forum

View all
AXRP - the AI X-risk Research Podcast by Daniel Filan

AXRP - the AI X-risk Research Podcast

9 Listeners