This episode explores the challenge of getting self-interested AI agents to cooperate without hardcoding cooperative behavior, examining a 2026 Google paper on multi-agent cooperation through in-context co-player inference. The hosts build up the technical foundations carefully, explaining why standard reinforcement learning breaks down in multi-agent settings due to non-stationarity, and how social dilemmas like the Prisoner's Dilemma cause agents to reliably converge on mutual defection even when cooperation would benefit everyone. The discussion traces the lineage of learning-aware agents, particularly LOLA, which achieved cooperation by differentiating through an opponent's gradient updates — a clever but architecturally demanding approach. The paper under review argues that training a transformer on a diverse pool of co-players lets in-context learning produce emergent cooperation without any of that machinery. Listeners interested in the intersection of game theory, multi-agent RL, and modern sequence modeling will find the episode's careful unpacking of why prior approaches fell short — and what the new framing claims to replace — genuinely illuminating.
Sources:
1. Multi-agent cooperation through in-context co-player inference — Marissa A. Weis, Maciej Wołczyk, Rajai Nasser, Rif A. Saurous, Blaise Agüera y Arcas, João Sacramento, Alexander Meulemans, 2026
http://arxiv.org/abs/2602.16301
2. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments — Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch, 2017
https://scholar.google.com/scholar?q=Multi-Agent+Actor-Critic+for+Mixed+Cooperative-Competitive+Environments
3. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning — Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Waard, Georgios Papoudakis, Jakob Foerster, Shimon Whiteson, 2018
https://scholar.google.com/scholar?q=QMIX:+Monotonic+Value+Function+Factorisation+for+Deep+Multi-Agent+Reinforcement+Learning
4. A Survey and Critique of Multiagent Deep Reinforcement Learning — Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor, 2019
https://scholar.google.com/scholar?q=A+Survey+and+Critique+of+Multiagent+Deep+Reinforcement+Learning
5. Multi-Agent Transformer: Scalable Cooperative Multi-Agent Reinforcement Learning — Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang, 2022
https://scholar.google.com/scholar?q=Multi-Agent+Transformer:+Scalable+Cooperative+Multi-Agent+Reinforcement+Learning
6. Language Models are Few-Shot Learners — Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al., 2020
https://scholar.google.com/scholar?q=Language+Models+are+Few-Shot+Learners
7. An Explanation of In-Context Learning as Implicit Bayesian Inference — Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma, 2022
https://scholar.google.com/scholar?q=An+Explanation+of+In-Context+Learning+as+Implicit+Bayesian+Inference
8. Transformers Learn In-Context by Gradient Descent — Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, Joao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov, 2023
https://scholar.google.com/scholar?q=Transformers+Learn+In-Context+by+Gradient+Descent
9. Algorithm Distillation in Reinforcement Learning — Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Morrill, Abhinav Gupta, Pieter Abbeel, Oriol Vinyals, 2023
https://scholar.google.com/scholar?q=Algorithm+Distillation+in+Reinforcement+Learning
10. Emergent Complexity via Multi-Agent Competition — Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Emergent+Complexity+via+Multi-Agent+Competition
11. Emergent Tool Use from Multi-Agent Interaction — Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch, 2019
https://scholar.google.com/scholar?q=Emergent+Tool+Use+from+Multi-Agent+Interaction
12. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning — Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas, 2019
https://scholar.google.com/scholar?q=Social+Influence+as+Intrinsic+Motivation+for+Multi-Agent+Deep+Reinforcement+Learning
13. Cooperative Multi-Agent Learning: The State of the Art — Liviu Panait, Sean Luke, 2005
https://scholar.google.com/scholar?q=Cooperative+Multi-Agent+Learning:+The+State+of+the+Art
14. The Evolution of Cooperation — Robert Axelrod, William D. Hamilton, 1981
https://scholar.google.com/scholar?q=The+Evolution+of+Cooperation
15. Iterated Prisoner's Dilemma Contains Strategies that Dominate Any Evolutionary Opponent — William H. Press, Freeman J. Dyson, 2012
https://scholar.google.com/scholar?q=Iterated+Prisoner's+Dilemma+Contains+Strategies+that+Dominate+Any+Evolutionary+Opponent
16. Evolutionary Instability of Zero-Determinant Strategies Demonstrates That Winning Is Not Everything — Christoph Adami, Arend Hintze, 2013
https://scholar.google.com/scholar?q=Evolutionary+Instability+of+Zero-Determinant+Strategies+Demonstrates+That+Winning+Is+Not+Everything
17. Learning with Opponent-Learning Awareness (LOLA) — Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Learning+with+Opponent-Learning+Awareness+(LOLA)
18. Learning with Opponent-Learning Awareness — Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018
https://scholar.google.com/scholar?q=Learning+with+Opponent-Learning+Awareness
19. Model-Free Opponent Shaping — Chris Lu, Timon Willi, Christian Schroeder de Waard, Jakob Foerster, 2022
https://scholar.google.com/scholar?q=Model-Free+Opponent+Shaping
20. In-context reinforcement learning with algorithm distillation — Laskin, M., Wang, L., Oh, J., Parisotto, E., Spencer, S., Steigerwald, R., Strouse, D., Hansen, S., Filos, A., Brooks, E., Gazeau, M., Sahni, H., Singh, S., Mnih, V., 2023
https://scholar.google.com/scholar?q=In-context+reinforcement+learning+with+algorithm+distillation
21. RL^2: Fast reinforcement learning via slow reinforcement learning — Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., Abbeel, P., 2016
https://scholar.google.com/scholar?q=RL^2:+Fast+reinforcement+learning+via+slow+reinforcement+learning
22. Cooperating with unknown teammates in complex domains by acting carefully with information — Aghajohari, M., Duque, J., Cooijmans, T., Courville, A., 2024
https://scholar.google.com/scholar?q=Cooperating+with+unknown+teammates+in+complex+domains+by+acting+carefully+with+information
23. From naive to learning-aware: Emergence of cooperative behaviors in multi-agent systems — Meulemans, A., et al., 2025
https://scholar.google.com/scholar?q=From+naive+to+learning-aware:+Emergence+of+cooperative+behaviors+in+multi-agent+systems
24. Generative agents: Interactive simulacra of human behavior — Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., Bernstein, M. S., 2023
https://scholar.google.com/scholar?q=Generative+agents:+Interactive+simulacra+of+human+behavior
25. Do Pre-trained Transformers Really Learn In-context by Gradient Descent? — Shen et al. (approximate), 2023-2024
https://scholar.google.com/scholar?q=Do+Pre-trained+Transformers+Really+Learn+In-context+by+Gradient+Descent?
26. When is diversity rewarded in cooperative multi-agent learning? — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=When+is+diversity+rewarded+in+cooperative+multi-agent+learning?
27. The evolution of zero-determinant strategies in public goods game — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=The+evolution+of+zero-determinant+strategies+in+public+goods+game
28. Uncoupled learning of differential Stackelberg equilibria with commitments — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Uncoupled+learning+of+differential+Stackelberg+equilibria+with+commitments
29. Non-coercive extortion in game theory — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Non-coercive+extortion+in+game+theory
30. Reciprocal reward influence encourages cooperation from self-interested agents — approximate, exact authors not in snippet, recent
https://scholar.google.com/scholar?q=Reciprocal+reward+influence+encourages+cooperation+from+self-interested+agents
31. AI Post Transformers: In-Context Learning as Implicit Learning Algorithms — Hal Turing & Dr. Ada Shannon, Wed,
https://podcasters.spotify.com/pod/show/12146088098/episodes/In-Context-Learning-as-Implicit-Learning-Algorithms-e39sjmn
32. AI Post Transformers: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts — Hal Turing & Dr. Ada Shannon, Tue,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Zero-Shot-Context-Generalization-in-Reinforcement--Learning-from-Few-Training-Contexts-e3fi0t5
33. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, Fri,
https://podcasters.spotify.com/pod/show/12146088098/episodes/Experiential-Reinforcement-Learning-Internalizing-Reflection-for-Better-Policy-Training-e3fbel0
Interactive Visualization: Emergent Cooperation in Self-Interested Multi-Agent AI