The collaboration between MIT, NUS, NYU, Microsoft, UW , Columbia and NTU describes an inference retrieval Chain of Thought enhancement. The researchers introduce MATTRL, a framework designed to improve the reasoning of Large Language Models (LLMs) through multi-agent collaboration and reinforcement learning. This system organizes specialized AI agents into Multidisciplinary Teams (MDT) to tackle complex tasks in fields like rare disease diagnosis and educational pedagogy. The process utilizes a structured consensus-building workflow where specialists contribute individual updates that are synthesized into a shared report. To refine performance, the system employs credit assignment methods, specifically the Difference Rewards approach, to identify and reuse high-quality strategies from successful interactions. By extracting these reusable experiences, the framework provides dense guidance that helps agents anchor on key evidence and maintain honest uncertainty. Ultimately, the research demonstrates how collaborative intelligence and targeted feedback can significantly enhance the precision of AI in specialized domains. If you're not familiar with Test Time Reinforcement Learning, you can review our old episode which covered it: https://open.spotify.com/episode/1rgQtzHZ3SFjDNpxghSjKd?si=zY9jThrYTgSlW0sjdVPLSg Source: January 15, 2026 https://arxiv.org/pdf/2601.09667