Dave Linthicum Is Not AI

AI Hype Gets Wrecked by Real-World Job Test


Listen Later

This video breaks down a new paper, Remote Labor Index: Measuring AI Automation of Remote Work, and why it challenges the biggest claims in the AI hype cycle. Instead of scoring AI on narrow tests, the researchers evaluated frontier systems on 240 real freelance projects across 23 categories of paid remote work, using actual briefs, input files, and human-created deliverables as the benchmark. The result was brutal: current AI agents performed near the floor, with the best system reaching only a 2.5% automation rate.

In other words, even the strongest model could complete only a tiny fraction of projects at a level a reasonable client would accept as commissioned work. The study also found widespread failure modes, including poor quality, incomplete outputs, corrupted files, and inconsistent deliverables. That matters because these were not toy tasks; the benchmark represented over 6,000 hours of real work and more than USD 140,000 in project value. The takeaway is not that AI has zero value, but that the real-world automation story is far weaker than the hype suggests, especially when judged on end-to-end professional freelance work. It shows progress is real, but broad autonomous labor replacement is still much more promise than present-day reality today.

...more
View all episodesView all episodes
Download on the App Store

Dave Linthicum Is Not AIBy David Linthicum