January 01, 2026

This Deep Research Agent Ignored the Benchmark and Still Won

29 minutes

Tavily built a Deep Research Agent with production in mind. Something they could actually scale. So they did the unsexy work. They went through millions of agent logs, found where tokens were being wasted, and optimized each section of the system.

The result surprised them: they cut token consumption by more than half (!), then tested quality and discovered they topped the DeepResearch Bench without even trying.

In this YAAP episode, Yuval sits down with Dean from Tavily to break down how they built it, what they did differently from the usual top approaches, and which design choices made better results possible with far fewer tokens.

What you’ll learn:

How to reduce token burn without tanking quality

Why reading millions of logs beats chasing the flashiest tech

The design choices that pushed quality up while tokens dropped hard

...more

View all episodes

By AI21

January 01, 2026

This Deep Research Agent Ignored the Benchmark and Still Won

29 minutes

The result surprised them: they cut token consumption by more than half (!), then tested quality and discovered they topped the DeepResearch Bench without even trying.

What you’ll learn:

How to reduce token burn without tanking quality

Why reading millions of logs beats chasing the flashiest tech

The design choices that pushed quality up while tokens dropped hard

...more

Share This Deep Research Agent Ignored the Benchmark and Still Won

Sign up to save your podcasts

This Deep Research Agent Ignored the Benchmark and Still Won

This Deep Research Agent Ignored the Benchmark and Still Won