
Sign up to save your podcasts
Or
METR released a new paper with very interesting results on developer productivity effects from AI. I have copied their blog post here in full.
We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].
See the full paper for more detail.
Motivation
While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation [...]
---
Outline:
(01:23) Motivation
(02:39) Methodology
(03:56) Core Result
(05:15) Factor Analysis
(06:12) Discussion
(11:08) Going Forward
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
METR released a new paper with very interesting results on developer productivity effects from AI. I have copied their blog post here in full.
We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].
See the full paper for more detail.
Motivation
While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation [...]
---
Outline:
(01:23) Motivation
(02:39) Methodology
(03:56) Core Result
(05:15) Factor Analysis
(06:12) Discussion
(11:08) Going Forward
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,469 Listeners
2,395 Listeners
7,928 Listeners
4,142 Listeners
89 Listeners
1,472 Listeners
9,207 Listeners
88 Listeners
417 Listeners
5,448 Listeners
15,237 Listeners
481 Listeners
121 Listeners
75 Listeners
461 Listeners