When Splitting One Model Across Three Agents Doubles Its Accuracy
Source: NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning
Paper was published on May 16, 2026
This episode was AI-generated on May 20, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.
Take a small language model, freeze it, and give it a fixed budget of trainable parameters. Putting all those parameters into one agent gets you 24% on a physics exam. Splitting them across three agents that talk to each other in plain English gets you 44% — same model, same compute, same reward signal. A new paper argues organization itself is a scaling axis we've been ignoring, and that the way you train these systems matters more than anyone realized.
Key Takeaways
Why a controlled comparison shows three agents sharing a parameter budget can nearly double the accuracy of a single agent with the same budgetHow REINFORCE lets you train a graph of language models end-to-end using just one bit of reward, despite the signals between agents being discrete textThe progressive growth result: identical seven-node architectures either fail or succeed depending entirely on whether you train them from scratch or grow them from a smaller working systemWhy the paper's 'role-free' framing is doing slightly more rhetorical work than it should — structural prompting still bakes in real priorsThe missing experiment that would make this work bulletproof: an inference-cost-matched baseline, and a sweep showing the gains survive at frontier-model scaleA concrete warning for anyone building multi-agent systems: naively scaling up the number of agents can make performance actively worse00:00 — The 20-point gap that motivates the paper
A headline result where splitting a fixed parameter budget across three communicating agents dramatically outperforms putting it all in one.03:40 — Agents as positions in a graph, not job titles
The paper's central reframe: treat multi-agent systems as neural networks where each neuron is a language model and each signal is text.07:21 — How you train a network you can't differentiate through
Walking through REINFORCE and the basketball-scoreboard intuition for why a single reward signal can teach an entire graph of agents to specialize.11:01 — The controlled experiment and what it does and doesn't show
Examining the parameter-matched comparison across ARC, MMLU physics, and HumanEval — and the inference-cost and backbone-scale caveats.14:42 — Why bigger teams fail from scratch but succeed when grown
The progressive growth result: the same architecture flips from a downward to an upward scaling curve depending on the training schedule.18:23 — How 'role-free' the agents actually are
A careful look at how much structural information the prompts still encode, even without semantic role descriptions.22:03 — What this means for the field
Three takeaways: a new scaling axis accessible outside frontier labs, a reframe of the agent abstraction, and a warning against naive multi-agent scaling.Recommended Reading
The Bitter Lesson — Sutton's essay that the NeuroMAS authors invoke directly to argue against hand-engineered multi-agent structure — essential background for the episode's central framing.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning — Williams's original REINFORCE paper, the policy gradient algorithm Cassidy walks through with the basketball coaching analogy.Net2Net: Accelerating Learning via Knowledge Transfer — Chen et al.'s function-preserving network growth — the closest classical precedent for NeuroMAS's progressive growth trick of inserting zero-initialized nodes into an already-trained graph.DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines — An alternative approach to optimizing multi-LLM systems by learning prompts rather than adapter weights — a useful contrast to NeuroMAS's neural-network-style framing of agent graphs.