
Sign up to save your podcasts
Or


This paper proposes LatentMAS, a novel, training-free framework designed to improve the collaboration efficiency of Large Language Model (LLM)-based multi-agent systems (MAS). Unlike traditional approaches that use explicit natural language, LatentMAS facilitates communication and reasoning entirely within the **continuous latent space** of the models. This is achieved through **auto-regressive latent thought generation** inside each agent and **lossless latent working memory transfer** across agents via shared KV caches. The experimental results demonstrate substantial computational benefits, including **4x to 4.3x faster end-to-end inference** and a significant reduction of **70.8% to 83.7% in token usage** compared to text-based MAS baselines. Furthermore, the system consistently achieves **higher system-level reasoning accuracy**, indicating that collaboration using continuous latent representations offers greater expressive capacity than discrete tokens.
By Enoch H. KangThis paper proposes LatentMAS, a novel, training-free framework designed to improve the collaboration efficiency of Large Language Model (LLM)-based multi-agent systems (MAS). Unlike traditional approaches that use explicit natural language, LatentMAS facilitates communication and reasoning entirely within the **continuous latent space** of the models. This is achieved through **auto-regressive latent thought generation** inside each agent and **lossless latent working memory transfer** across agents via shared KV caches. The experimental results demonstrate substantial computational benefits, including **4x to 4.3x faster end-to-end inference** and a significant reduction of **70.8% to 83.7% in token usage** compared to text-based MAS baselines. Furthermore, the system consistently achieves **higher system-level reasoning accuracy**, indicating that collaboration using continuous latent representations offers greater expressive capacity than discrete tokens.