Claude Code Cast

The Agent Benchmark That Should Scare Managers


Listen Later

Agentic coding tools are moving into enterprise workflows, but the week's most useful signal is a benchmark where frontier models still struggle below 50% on real IT tasks. Alex and Sam unpack Microsoft Learn grounding, agent deception, Copilot data leaks, and the practical harness every team should build before handing agents production authority.

...more
View all episodesView all episodes
Download on the App Store

Claude Code CastBy AI World