
Sign up to save your podcasts
Or


Every few weeks at Microsoft, someone would build an AI prototype that blew everyone's minds. Three months later? Dead. "We can never ship that." Dan Klein watched this happen for five years before he decided to do something about it.
Dan is co-founder and CTO of Scaled Cognition, a professor of computer science at UC Berkeley, and winner of the ACM Grace Murray Hopper Award. His previous startups include adap.tv (acquired by AOL for $405M) and Semantic Machines (acquired by Microsoft in 2018), where he spent five years integrating conversational AI. His PhD students now run AI teams at Google, Stanford, MIT, and OpenAI.
At Scaled Cognition, Dan's team built APT1 (the Agentic Pre-trained Transformer) for under $11 million. It's a model designed for actions, not tokens, with structural guarantees that go beyond prompt-and-pray.
Dan makes the case that current LLMs are plausibility engines, not truth engines, and that the gap between demo and production is where most AI projects die.
Chapters
(0:00) Cold open: RL is about doubling down on what works
(0:28) Introducing Dan Klein and Scaled Cognition
(2:53) The demo-to-production gap: why AI prototypes die
(5:40) Why prompting is not a real control surface
(8:06) Modular decomposition vs. end-to-end optimization
(10:55) Are LLMs fundamentally mismatched with how we use them?
(14:26) What's wrong with benchmarks today
(20:27) APT1: building a model for actions, not tokens
(24:14) What makes data truly agentic
(28:02) Hallucinations as an iceberg — visible vs. undetectable
(34:16) Building a prototype model for under $11 million
(39:57) Applying RL to conversations without a zero-sum winner
(43:31) LLMs as a condensation of the web — and what happens when it runs out
(50:07) Reasoning models: where they work and where they don't
(53:04) Early deployments in regulated industries
(57:14) Why multi-model checking fails
(1:00:34) The minimum bar for trustworthy agentic systems
(1:04:07) Societal risk: when AI output is indistinguishable from truth
(1:13:33) Where Dan is inspired in AI research today
Connect with Dan Klein:
Connect with Conor:
More episodes: https://chainofthought.show
Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems
By Conor Bronsdon5
2727 ratings
Every few weeks at Microsoft, someone would build an AI prototype that blew everyone's minds. Three months later? Dead. "We can never ship that." Dan Klein watched this happen for five years before he decided to do something about it.
Dan is co-founder and CTO of Scaled Cognition, a professor of computer science at UC Berkeley, and winner of the ACM Grace Murray Hopper Award. His previous startups include adap.tv (acquired by AOL for $405M) and Semantic Machines (acquired by Microsoft in 2018), where he spent five years integrating conversational AI. His PhD students now run AI teams at Google, Stanford, MIT, and OpenAI.
At Scaled Cognition, Dan's team built APT1 (the Agentic Pre-trained Transformer) for under $11 million. It's a model designed for actions, not tokens, with structural guarantees that go beyond prompt-and-pray.
Dan makes the case that current LLMs are plausibility engines, not truth engines, and that the gap between demo and production is where most AI projects die.
Chapters
(0:00) Cold open: RL is about doubling down on what works
(0:28) Introducing Dan Klein and Scaled Cognition
(2:53) The demo-to-production gap: why AI prototypes die
(5:40) Why prompting is not a real control surface
(8:06) Modular decomposition vs. end-to-end optimization
(10:55) Are LLMs fundamentally mismatched with how we use them?
(14:26) What's wrong with benchmarks today
(20:27) APT1: building a model for actions, not tokens
(24:14) What makes data truly agentic
(28:02) Hallucinations as an iceberg — visible vs. undetectable
(34:16) Building a prototype model for under $11 million
(39:57) Applying RL to conversations without a zero-sum winner
(43:31) LLMs as a condensation of the web — and what happens when it runs out
(50:07) Reasoning models: where they work and where they don't
(53:04) Early deployments in regulated industries
(57:14) Why multi-model checking fails
(1:00:34) The minimum bar for trustworthy agentic systems
(1:04:07) Societal risk: when AI output is indistinguishable from truth
(1:13:33) Where Dan is inspired in AI research today
Connect with Dan Klein:
Connect with Conor:
More episodes: https://chainofthought.show
Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

112,194 Listeners