
Sign up to save your podcasts
Or


Deep dive with Dr. Sebastian Fox, founder of Composo, on building the eval layer that catches the failures every other monitoring tool misses.
Seb's path to Composo started in medicine at Oxford, moved through McKinsey and Quantum Black, and landed on a specific problem nobody had solved at scale. Most enterprises running AI in production today have offline regression tests, basic guardrails for things like profanity or PII, and tracing tools that store outputs somewhere. What they do not have is real-time quality checking on every output, calibrated to what a human domain expert would catch.
Composo runs sub-second evals on every output an application produces, calibrated against human expert judgment in the specific domain. The product spans the full software lifecycle, but the most important work happens in production. Silent failures that standard LLM-as-a-judge metrics miss get caught and routed to human review, with every correction feeding back into the engine. Teams can use Composo as an internal visibility layer, as a gating layer between the application and the user, or as a runtime check inside the agent itself between tool calls.
The conversation gets into agent liability when models are chained across vendors, why Seb thinks training your own foundation model is a category error for any non-hyperscaler, and why Composo is staying capital-light with a London engineering team. Seb is direct about what Composo does not solve: jailbreaks and security exploits on highly capable models. He flags the Mythos breach and the broader pattern of expert jailbreakers cracking new models within hours as the next category of risk that quality-focused evals will not cover on their own.
Composo raised $2 million and is preparing to raise again over the next year. Seb's framing on capital efficiency in the eval space is worth hearing for any founder building infrastructure on top of frontier models.
—
Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. Deep Dives drop on off-days with founders building in the space. New episodes Monday, Wednesday, Friday.
agenticstories.ai
By Alex HirsuDeep dive with Dr. Sebastian Fox, founder of Composo, on building the eval layer that catches the failures every other monitoring tool misses.
Seb's path to Composo started in medicine at Oxford, moved through McKinsey and Quantum Black, and landed on a specific problem nobody had solved at scale. Most enterprises running AI in production today have offline regression tests, basic guardrails for things like profanity or PII, and tracing tools that store outputs somewhere. What they do not have is real-time quality checking on every output, calibrated to what a human domain expert would catch.
Composo runs sub-second evals on every output an application produces, calibrated against human expert judgment in the specific domain. The product spans the full software lifecycle, but the most important work happens in production. Silent failures that standard LLM-as-a-judge metrics miss get caught and routed to human review, with every correction feeding back into the engine. Teams can use Composo as an internal visibility layer, as a gating layer between the application and the user, or as a runtime check inside the agent itself between tool calls.
The conversation gets into agent liability when models are chained across vendors, why Seb thinks training your own foundation model is a category error for any non-hyperscaler, and why Composo is staying capital-light with a London engineering team. Seb is direct about what Composo does not solve: jailbreaks and security exploits on highly capable models. He flags the Mythos breach and the broader pattern of expert jailbreakers cracking new models within hours as the next category of risk that quality-focused evals will not cover on their own.
Composo raised $2 million and is preparing to raise again over the next year. Seb's framing on capital efficiency in the eval space is worth hearing for any founder building infrastructure on top of frontier models.
—
Agentic Stories is the weekday briefing on the AI agent economy — governance, security, and deployment. Deep Dives drop on off-days with founders building in the space. New episodes Monday, Wednesday, Friday.
agenticstories.ai