April 29, 2026

Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI

42 minutes

Alex Ratner co-founded Snorkel AI out of Chris Ré's Stanford lab and helped establish data-centric AI as a field. Today, Snorkel is a $1.3B company shipping thousands of data sets and environments a week to frontier labs and vertical AI teams like Harvey.

In this conversation, he argues our ability to build AI agents has outpaced our ability to measure them. That gap is what's keeping most enterprise agents stuck in demo purgatory.

If you can't measure it, you can't improve it. And you can't deploy it.

In this conversation:

The three axes of the evaluation gap: input complexity, autonomy horizon, and output complexity
Big Law Bench: how Snorkel and Harvey benchmarked legal agents on deep-research tasks that take lawyers 10-15 hours
What Snorkel's $3M Open Benchmarks Grant is funding, and why "benchmaxxing" critiques don't kill the case for public benchmarks
Why 40-50% of Snorkel's data work is still review and labeling, even with the best models in the loop
The "expert-agentic" era, where domain expertise (law, finance, coding, even woodworking) is the new bottleneck
Why self-supervision is a dead end outside narrow cases like distillation
The false dichotomy between data and environments, and why pure-environment vendors miss how AI actually works

Chapters

(00:00) Intro: Alex Ratner and Snorkel AI
(02:50) What the evaluation gap actually is
(06:05) Moravec's paradox and the jagged frontier
(08:46) Where AI agents fall down in enterprise work
(10:40) Big Law Bench: benchmarking Harvey's legal agents
(12:00) The three axes: input, autonomy horizon, output
(18:31) Snorkel's $3M Open Benchmarks Grant
(22:33) From "janitorial" to epicenter: 15 years of data-centric AI
(29:26) The expert-agentic data era
(34:54) The false dichotomy between data and environments
(40:05) DoorDash Tasks and expert data at scale

Connect with Alex Ratner:

X/Twitter: https://x.com/ajratner
Snorkel AI: https://snorkel.ai

Connect with Conor:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

More episodes: https://chainofthought.show

Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

...more

View all episodes

By Conor Bronsdon

2727 ratings

April 29, 2026

Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI

42 minutes

In this conversation, he argues our ability to build AI agents has outpaced our ability to measure them. That gap is what's keeping most enterprise agents stuck in demo purgatory.

If you can't measure it, you can't improve it. And you can't deploy it.

In this conversation:

The three axes of the evaluation gap: input complexity, autonomy horizon, and output complexity
Big Law Bench: how Snorkel and Harvey benchmarked legal agents on deep-research tasks that take lawyers 10-15 hours
What Snorkel's $3M Open Benchmarks Grant is funding, and why "benchmaxxing" critiques don't kill the case for public benchmarks
Why 40-50% of Snorkel's data work is still review and labeling, even with the best models in the loop
The "expert-agentic" era, where domain expertise (law, finance, coding, even woodworking) is the new bottleneck
Why self-supervision is a dead end outside narrow cases like distillation
The false dichotomy between data and environments, and why pure-environment vendors miss how AI actually works

Chapters

Connect with Alex Ratner:

X/Twitter: https://x.com/ajratner
Snorkel AI: https://snorkel.ai

Connect with Conor:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

More episodes: https://chainofthought.show

Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

...more

More shows like Chain of Thought | AI Agents, Infrastructure & Engineering

View all

The Daily

112,191 Listeners

Share Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI

Sign up to save your podcasts

Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI

Every AI Agent Has an Evaluation Gap | Alex Ratner, Snorkel AI

More shows like Chain of Thought | AI Agents, Infrastructure & Engineering

The Daily