This episode explores a 2025 paper on MemSearcher, an LLM search agent that replaces full trajectory replay with a compact learned memory, trained end-to-end with reinforcement learning. It explains how this approach targets a core weakness of ReAct-style agents—ever-growing context windows that increase cost, latency, and noise—and contrasts it with both vanilla ReAct and Search-R1, which improves search behavior without explicitly learning what to retain. The discussion connects reinforcement learning, retrieval-augmented generation, agent memory systems, and reasoning-budget control, arguing that context management should be treated as part of the learned policy rather than an afterthought. Listeners interested in AI agents will find it compelling because it frames memory compression not just as an efficiency trick, but as a potentially important source of better search and reasoning performance.
Sources:
1. MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning — Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu, Hongyu Lin, Le Sun, Debing Zhang, Xianpei Han, 2025
http://arxiv.org/abs/2511.02805
2. ReAct: Synergizing Reasoning and Acting in Language Models — Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao, 2023
https://scholar.google.com/scholar?q=ReAct:+Synergizing+Reasoning+and+Acting+in+Language+Models
3. Search-R1 — Jin et al., 2025
https://scholar.google.com/scholar?q=Search-R1
4. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — Daya Guo, Dejian Yang, Haowei Zhang, et al., 2024
https://scholar.google.com/scholar?q=DeepSeekMath:+Pushing+the+Limits+of+Mathematical+Reasoning+in+Open+Language+Models
5. Proximal Policy Optimization Algorithms — John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017
https://scholar.google.com/scholar?q=Proximal+Policy+Optimization+Algorithms
6. Training Language Models to Self-Correct via Reinforcement Learning — Tianjun Zhang, et al., 2024
https://scholar.google.com/scholar?q=Training+Language+Models+to+Self-Correct+via+Reinforcement+Learning
7. Reflexion: Language Agents with Verbal Reinforcement Learning — Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao, 2023
https://scholar.google.com/scholar?q=Reflexion:+Language+Agents+with+Verbal+Reinforcement+Learning
8. Generative Agents: Interactive Simulacra of Human Behavior — Joon Sung Park, Joseph O'Brien, Carrie Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, Michael Terry, 2023
https://scholar.google.com/scholar?q=Generative+Agents:+Interactive+Simulacra+of+Human+Behavior
9. ACon: Optimizing Context Compression for Long-Horizon LLM Agents — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=ACon:+Optimizing+Context+Compression+for+Long-Horizon+LLM+Agents
10. Active Context Compression: Autonomous Memory Management in LLM Agents — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=Active+Context+Compression:+Autonomous+Memory+Management+in+LLM+Agents
11. From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=From+Lossy+to+Verified:+A+Provenance-Aware+Tiered+Memory+for+Agents
12. GRPO-: Credit Assignment Improves LLM Reasoning — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=GRPO-:+Credit+Assignment+Improves+LLM+Reasoning
13. InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=InT:+Self-Proposed+Interventions+Enable+Credit+Assignment+in+LLM+Reasoning
14. CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=CAPO:+Towards+Enhancing+LLM+Reasoning+through+Generative+Credit+Assignment
15. AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning — approx. unknown from snippet, 2025
https://scholar.google.com/scholar?q=AgentGym-RL:+Training+LLM+Agents+for+Long-Horizon+Decision+Making+through+Multi-Turn+Reinforcement+Learning
16. AI Post Transformers: Mem0: Scalable Long-Term Memory for AI Agents — Hal Turing & Dr. Ada Shannon, Tue,
https://podcast.do-not-panic.com/episodes/mem0-scalable-long-term-memory-for-ai-agents/
17. AI Post Transformers: Agentic AI and the Next Intelligence Explosion — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-28-agentic-ai-and-the-next-intelligence-exp-d06561.mp3
18. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, Fri,
https://podcast.do-not-panic.com/episodes/experiential-reinforcement-learning-internalizing-reflection-for-better-policy-t/
19. AI Post Transformers: NVIDIA: TTT-E2E: Unlocking Long-Context Learning via End-to-End Test-Time Training — Hal Turing & Dr. Ada Shannon, Sat,
https://podcast.do-not-panic.com/episodes/nvidia-ttt-e2e-unlocking-long-context-learning-via-end-to-end-test-time-training/
20. AI Post Transformers: Doc-to-LoRA: Internalizing Context as LoRA — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-29-doc-to-lora-internalizing-context-as-lor-8dd5ec.mp3
Interactive Visualization: MEMSEARCHER: Reinforcement Learning for LLM Memory Management