
Sign up to save your podcasts
Or


Episode Title: What ICLR 2026 Taught Us About Multi-Agent Failures
Episode Summary: We scanned ICLR 2026 accepted papers and found 14 that address real problems when building multi-agent systems: slow pipelines, expensive token bills, cascading errors, brittle topologies, and opaque agent coordination. This episode walks through five production problems and the research that provides concrete solutions.
Timestamps
TimeSection00:00Introduction: The gap between demos and production01:29Problem 1: Why is my agent system so slow?04:44Problem 2: My token bills are out of control07:30Problem 3: One agent hallucinates, the whole pipeline fails10:45Problem 4: My agent graph breaks when I swap a model12:53Problem 5: I have no idea what my agents are saying to each other15:39Recap: The practitioner's toolkit16:33What's still missing: Long-term stability and adversarial robustness17:02Closing
Papers Discussed
Problem 1: Latency
Speculative Actions - Uses faster draft models to predict likely actions and execute API calls in parallel. Up to 30% speedup across web search and OS control tasks.
Graph-of-Agents - Uses model cards to filter agents by relevance. Beat a 6-agent baseline using only 3 selected agents.
Problem 2: Token Costs
KVComm - Shares KV cache directly instead of translating to English. 30% of KV layers achieves near-full performance.
MEM1 - Uses RL-based memory consolidation to maintain constant context size. 3.7x memory reduction, 3.5x performance improvement.
Problem 3: Error Cascades
When Does Divide and Conquer Work - Noise decomposition framework identifying task noise, model noise (superlinear growth), and aggregator noise.
DoVer - Intervention-driven debugging that edits message history to validate failure hypotheses. Flips 28% of failures to successes.
Problem 4: Brittle Topologies
CARD - Conditional graph generation that adapts topology based on environmental signals.
MAS² - Generator-implementer-rectifier team that self-architects agent structures. 19.6% performance gain with cross-backbone generalization.
Stochastic Self-Organization - Decentralized approach using Shapley value approximations. Hierarchy emerges from competence without explicit design.
Problem 5: Observability
GLC - Autoencoder creates compressed symbols aligned with human concepts via contrastive learning. Speed of symbols, auditability of words.
Emergent Coordination - Information-theoretic metrics distinguishing real collaboration from "spurious temporal coupling." Key finding: you must prompt for theory of mind.
ROTE / Modeling Others' Minds as Code - Models agent behavior as executable scripts. 50% improvement in prediction accuracy.
Key Concepts Explained
TermExplanationSpeculative executionBorrowed from CPU architecture. Guess the next action, execute in parallel, discard if wrong.KV cacheThe model's working memory of conversation context, stored as mathematical vectors.Model noiseConfusion that grows with context size. Grows superlinearly with input length.Shapley valuesGame theory concept for assigning credit to players in cooperative games.Spurious temporal couplingAgents appearing to collaborate but actually solving problems independently at the same time.Contrastive learningPushing similar things closer and different things further apart in vector space.
Key Quotes
"English is a terrible data transfer protocol for machines. We're taking clean mathematical concepts, translating them into paragraphs, and then asking another machine to turn them back into math."
"The hierarchy emerges from competence. You don't design it."
"Did they solve it together, or did they just all happen to solve it at the same time by themselves?"
"Treating minds as software is a pretty effective way to predict what software will do."
Links
Newsletter: llmsresearch.substack.com
Twitter/X: @llmsresearch
By LLMs ResearchEpisode Title: What ICLR 2026 Taught Us About Multi-Agent Failures
Episode Summary: We scanned ICLR 2026 accepted papers and found 14 that address real problems when building multi-agent systems: slow pipelines, expensive token bills, cascading errors, brittle topologies, and opaque agent coordination. This episode walks through five production problems and the research that provides concrete solutions.
Timestamps
TimeSection00:00Introduction: The gap between demos and production01:29Problem 1: Why is my agent system so slow?04:44Problem 2: My token bills are out of control07:30Problem 3: One agent hallucinates, the whole pipeline fails10:45Problem 4: My agent graph breaks when I swap a model12:53Problem 5: I have no idea what my agents are saying to each other15:39Recap: The practitioner's toolkit16:33What's still missing: Long-term stability and adversarial robustness17:02Closing
Papers Discussed
Problem 1: Latency
Speculative Actions - Uses faster draft models to predict likely actions and execute API calls in parallel. Up to 30% speedup across web search and OS control tasks.
Graph-of-Agents - Uses model cards to filter agents by relevance. Beat a 6-agent baseline using only 3 selected agents.
Problem 2: Token Costs
KVComm - Shares KV cache directly instead of translating to English. 30% of KV layers achieves near-full performance.
MEM1 - Uses RL-based memory consolidation to maintain constant context size. 3.7x memory reduction, 3.5x performance improvement.
Problem 3: Error Cascades
When Does Divide and Conquer Work - Noise decomposition framework identifying task noise, model noise (superlinear growth), and aggregator noise.
DoVer - Intervention-driven debugging that edits message history to validate failure hypotheses. Flips 28% of failures to successes.
Problem 4: Brittle Topologies
CARD - Conditional graph generation that adapts topology based on environmental signals.
MAS² - Generator-implementer-rectifier team that self-architects agent structures. 19.6% performance gain with cross-backbone generalization.
Stochastic Self-Organization - Decentralized approach using Shapley value approximations. Hierarchy emerges from competence without explicit design.
Problem 5: Observability
GLC - Autoencoder creates compressed symbols aligned with human concepts via contrastive learning. Speed of symbols, auditability of words.
Emergent Coordination - Information-theoretic metrics distinguishing real collaboration from "spurious temporal coupling." Key finding: you must prompt for theory of mind.
ROTE / Modeling Others' Minds as Code - Models agent behavior as executable scripts. 50% improvement in prediction accuracy.
Key Concepts Explained
TermExplanationSpeculative executionBorrowed from CPU architecture. Guess the next action, execute in parallel, discard if wrong.KV cacheThe model's working memory of conversation context, stored as mathematical vectors.Model noiseConfusion that grows with context size. Grows superlinearly with input length.Shapley valuesGame theory concept for assigning credit to players in cooperative games.Spurious temporal couplingAgents appearing to collaborate but actually solving problems independently at the same time.Contrastive learningPushing similar things closer and different things further apart in vector space.
Key Quotes
"English is a terrible data transfer protocol for machines. We're taking clean mathematical concepts, translating them into paragraphs, and then asking another machine to turn them back into math."
"The hierarchy emerges from competence. You don't design it."
"Did they solve it together, or did they just all happen to solve it at the same time by themselves?"
"Treating minds as software is a pretty effective way to predict what software will do."
Links
Newsletter: llmsresearch.substack.com
Twitter/X: @llmsresearch