May 20, 2026

When Splitting One Model Across Three Agents Doubles Its Accuracy

25 minutes

Source: NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

Paper was published on May 16, 2026

This episode was AI-generated on May 20, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Take a small language model, freeze it, and give it a fixed budget of trainable parameters. Putting all those parameters into one agent gets you 24% on a physics exam. Splitting them across three agents that talk to each other in plain English gets you 44% — same model, same compute, same reward signal. A new paper argues organization itself is a scaling axis we've been ignoring, and that the way you train these systems matters more than anyone realized.

Key Takeaways

Why a controlled comparison shows three agents sharing a parameter budget can nearly double the accuracy of a single agent with the same budget

How REINFORCE lets you train a graph of language models end-to-end using just one bit of reward, despite the signals between agents being discrete text

The progressive growth result: identical seven-node architectures either fail or succeed depending entirely on whether you train them from scratch or grow them from a smaller working system

Why the paper's 'role-free' framing is doing slightly more rhetorical work than it should — structural prompting still bakes in real priors

The missing experiment that would make this work bulletproof: an inference-cost-matched baseline, and a sweep showing the gains survive at frontier-model scale

A concrete warning for anyone building multi-agent systems: naively scaling up the number of agents can make performance actively worse

00:00 — The 20-point gap that motivates the paper
A headline result where splitting a fixed parameter budget across three communicating agents dramatically outperforms putting it all in one.

03:40 — Agents as positions in a graph, not job titles
The paper's central reframe: treat multi-agent systems as neural networks where each neuron is a language model and each signal is text.

07:21 — How you train a network you can't differentiate through
Walking through REINFORCE and the basketball-scoreboard intuition for why a single reward signal can teach an entire graph of agents to specialize.

11:01 — The controlled experiment and what it does and doesn't show
Examining the parameter-matched comparison across ARC, MMLU physics, and HumanEval — and the inference-cost and backbone-scale caveats.

14:42 — Why bigger teams fail from scratch but succeed when grown
The progressive growth result: the same architecture flips from a downward to an upward scaling curve depending on the training schedule.

18:23 — How 'role-free' the agents actually are
A careful look at how much structural information the prompts still encode, even without semantic role descriptions.

22:03 — What this means for the field
Three takeaways: a new scaling axis accessible outside frontier labs, a reframe of the agent abstraction, and a warning against naive multi-agent scaling.