
Sign up to save your podcasts
Or


I contributed one (1) task to HCAST, which was used in METR's Long Tasks paper. This gave me some thoughts I feel moved to share.
Regarding Baselines and Estimates
METR's tasks have two sources for how long they take humans: most of those used in the paper were Baselined using playtesters under persistent scrutiny, and some were Estimated by METR.
I don’t quite trust the Baselines. Baseliners were allowed/incentivized to drop tasks they weren’t making progress with, and were – mostly, effectively, there's some nuance here I’m ignoring – cut off at the eight-hour mark; Baseline times were found by averaging time taken for successful runs; this suggests Baseline estimates will be biased to be at least slightly too low, especially for more difficult tasks.[1]
I really, really don’t trust the Estimates[2]. My task was never successfully Baselined, so METR's main source for how long it would take – [...]
---
Outline:
(00:22) Regarding Baselines and Estimates
(02:23) Regarding Task Privacy
(04:00) In Conclusion
The original text contained 9 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongI contributed one (1) task to HCAST, which was used in METR's Long Tasks paper. This gave me some thoughts I feel moved to share.
Regarding Baselines and Estimates
METR's tasks have two sources for how long they take humans: most of those used in the paper were Baselined using playtesters under persistent scrutiny, and some were Estimated by METR.
I don’t quite trust the Baselines. Baseliners were allowed/incentivized to drop tasks they weren’t making progress with, and were – mostly, effectively, there's some nuance here I’m ignoring – cut off at the eight-hour mark; Baseline times were found by averaging time taken for successful runs; this suggests Baseline estimates will be biased to be at least slightly too low, especially for more difficult tasks.[1]
I really, really don’t trust the Estimates[2]. My task was never successfully Baselined, so METR's main source for how long it would take – [...]
---
Outline:
(00:22) Regarding Baselines and Estimates
(02:23) Regarding Task Privacy
(04:00) In Conclusion
The original text contained 9 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,664 Listeners

130 Listeners

7,216 Listeners

530 Listeners

16,132 Listeners

4 Listeners

14 Listeners

2 Listeners