
Sign up to save your podcasts
Or


METR released a new paper with very interesting results on developer productivity effects from AI. I have copied their blog post here in full.
We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].
See the full paper for more detail.
Motivation
While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation [...]
---
Outline:
(01:23) Motivation
(02:39) Methodology
(03:56) Core Result
(05:15) Factor Analysis
(06:12) Discussion
(11:08) Going Forward
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongMETR released a new paper with very interesting results on developer productivity effects from AI. I have copied their blog post here in full.
We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1].
See the full paper for more detail.
Motivation
While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation [...]
---
Outline:
(01:23) Motivation
(02:39) Methodology
(03:56) Core Result
(05:15) Factor Analysis
(06:12) Discussion
(11:08) Going Forward
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,375 Listeners

2,424 Listeners

8,934 Listeners

4,153 Listeners

92 Listeners

1,594 Listeners

9,907 Listeners

90 Listeners

75 Listeners

5,469 Listeners

16,043 Listeners

539 Listeners

130 Listeners

95 Listeners

503 Listeners