<p>Agentic coding tools are moving into enterprise workflows, but the week's most useful signal is a benchmark where frontier models still struggle below 50% on real IT tasks. Alex and Sam unpack Microsoft Learn grounding, agent deception, Copilot data leaks, and the practical harness every team should build before handing agents production authority.</p>

Agentic coding tools are moving into enterprise workflows, but the week's most useful signal is a benchmark where frontier models still struggle below 50% on real IT tasks. Alex and Sam unpack Microsoft Learn grounding, agent deception, Copilot data leaks, and the practical harness every team should build before handing agents production authority.

The Agent Benchmark That Should Scare Managers

A weekly podcast covering everything Claude Code — from the latest Anthropic updates and AI coding news to practical tips, prompt strategies, and workflow optimization. Hosts Alex and Sam break down what matters for developers using AI coding tools, AI Agents, Agentic AI, compare the competitive landscape, spotlight community projects, and share actionable advice you can use in your next coding session.

Topics covered: AI Software Engineering, Claude AI tutorials, Anthropic API, Model Context Protocol (MCP), Agentic Coding, LLM-powered development, and Software Architecture with AI.

Technology

A weekly podcast covering everything Claude Code — from the latest Anthropic updates and AI coding news to practical tips, prompt strategies, and workflow optimization. Hosts Alex and Sam break down what matters for developers using AI coding tools, AI Agents, Agentic AI, compare the competitive landscape, spotlight community projects, and share actionable advice you can use in your next coding session. Topics covered: AI Software Engineering, Claude AI tutorials, Anthropic API, Model Context Protocol (MCP), Agentic Coding, LLM-powered development, and Software Architecture with AI.

Share The Agent Benchmark That Should Scare Managers

Sign up to save your podcasts

The Agent Benchmark That Should Scare Managers

The Agent Benchmark That Should Scare Managers