
Sign up to save your podcasts
Or


In this episode, drawing on insights from the sources, METR researcher Joel Becker explores the widening gap between AI’s exponential progress on benchmarks and its actual impact on real-world productivity. We examine a surprising study where expert developers were slowed down by 19% when using AI, challenging the assumption that benchmark success translates directly into immediate economic gains. The discussion investigates the "puzzle" of why low AI reliability and the complexity of high-context environments continue to hinder performance in the field compared to synthetic tests.
Source: https://www.youtube.com/watch?v=RhfqQKe22ZA&list=TLGGeQVQrQpc6NgyODEyMjAyNQ
By Build Wiz AIIn this episode, drawing on insights from the sources, METR researcher Joel Becker explores the widening gap between AI’s exponential progress on benchmarks and its actual impact on real-world productivity. We examine a surprising study where expert developers were slowed down by 19% when using AI, challenging the assumption that benchmark success translates directly into immediate economic gains. The discussion investigates the "puzzle" of why low AI reliability and the complexity of high-context environments continue to hinder performance in the field compared to synthetic tests.
Source: https://www.youtube.com/watch?v=RhfqQKe22ZA&list=TLGGeQVQrQpc6NgyODEyMjAyNQ