
Sign up to save your podcasts
Or
This is METR's collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models. The resources include a task suite, some software tooling, and guidelines on how to ensure an accurate measurement of model capability. Building on those, we’ve written an example evaluation protocol. While intended as a “beta” and early working draft, the protocol represents our current best guess as to how AI developers and evaluators should evaluate models for dangerous autonomous capabilities.
We hope to iteratively improve this content, with explicit versioning; this is v0.1.
---
First published:
Source:
Linkpost URL:
https://metr.github.io/autonomy-evals-guide/index.html
Narrated by TYPE III AUDIO.
This is METR's collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models. The resources include a task suite, some software tooling, and guidelines on how to ensure an accurate measurement of model capability. Building on those, we’ve written an example evaluation protocol. While intended as a “beta” and early working draft, the protocol represents our current best guess as to how AI developers and evaluators should evaluate models for dangerous autonomous capabilities.
We hope to iteratively improve this content, with explicit versioning; this is v0.1.
---
First published:
Source:
Linkpost URL:
https://metr.github.io/autonomy-evals-guide/index.html
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
461 Listeners