
Sign up to save your podcasts
Or


Scale AI and the Center for AI Safety tested frontier agents on 240 real freelance jobs.
Not toy benchmarks.
Result:
Why?
Because real work is messy.
• Specs change mid-project
Agents don’t fail because they lack raw intelligence.
They fail because they need structure.
What the study also showed:
AI performs well in:
Clear inputs. Clear outputs. Defined scope.
The insight:
AI struggles with ambiguity.
So the real opportunity isn’t replacing complex professionals.
It’s redesigning workflows.
Humans own:
AI owns:
The companies that win won’t drop agents into chaos.
They’ll redesign work so machines handle the structured layer and humans operate at the messy layer.
Messy is still job security.
Structure is already automated.
Get my weekly breakdown of AI, GTM, and Cloud Employees:
https://atonom.ai/newsletter
Ready to hire your first Cloud Employee?
https://atonom.ai/
By Gabe LarsenScale AI and the Center for AI Safety tested frontier agents on 240 real freelance jobs.
Not toy benchmarks.
Result:
Why?
Because real work is messy.
• Specs change mid-project
Agents don’t fail because they lack raw intelligence.
They fail because they need structure.
What the study also showed:
AI performs well in:
Clear inputs. Clear outputs. Defined scope.
The insight:
AI struggles with ambiguity.
So the real opportunity isn’t replacing complex professionals.
It’s redesigning workflows.
Humans own:
AI owns:
The companies that win won’t drop agents into chaos.
They’ll redesign work so machines handle the structured layer and humans operate at the messy layer.
Messy is still job security.
Structure is already automated.
Get my weekly breakdown of AI, GTM, and Cloud Employees:
https://atonom.ai/newsletter
Ready to hire your first Cloud Employee?
https://atonom.ai/