January 15, 2026

Why AI Leaderboards Miss the Point

56 minutes

Leaderboards reward “best average score.”

Real users reward “answer fast, don’t hallucinate, don’t bankrupt me.”

In this special deep dive episode, AI21’s CTO Barak Lenz walks through four gaps between what models can do and what real AI systems deliver: validation, contextualization (pick the right approach per input), latency (parallelize and stop early), and decomposition (making those choices continuously inside long workflows).

Less “best model.” More “best execution.”

...more