Daily Tech Feed: From the Labs

Agents of Chaos


Listen Later

Episode 0019: Agents of Chaos

Why it matters. Someone finally ran a proper pentest on autonomous AI agents — not a benchmark, not a toy environment, but a live deployment with persistent memory, email, Discord, file systems, and shell execution. Agents of Chaos documents eleven distinct failure modes discovered over two weeks of red-teaming by twenty AI researchers, and every single one maps to a known vulnerability class in traditional security engineering. The episode then pivots to EMPO², a hybrid reinforcement learning framework from Microsoft Research that starts to address the deeper problem: training agents that explore alternatives and learn from failures instead of defaulting to catastrophic compliance.

Northeastern University, Carnegie Mellon University, Microsoft Research, KAIST. The Agents of Chaos study is a multi-institutional effort led out of Northeastern's Bau Lab with collaborators across CMU, UBC, UT Austin, and several other institutions. The paper is available on arXiv (2602.20021), with a project page at agentsofchaos.baulab.info. EMPO² comes from Microsoft Research Asia and KAIST, available on arXiv (2602.23008) with code on GitHub and a project page at agent-lightning.github.io.

The Researchers. Agents of Chaos is led by Natalie Shapira (Northeastern, interpretability and Theory of Mind) and David Bau (Northeastern, interpretable deep networks, ex-Google, Sloan fellow), with significant contributions from Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Christoph Riedl (Northeastern, computational social science), Maarten Sap (CMU, social AI), Tomer Ullman (Harvard, cognitive science), and David Manheim — thirty-eight authors in total. EMPO² is the work of Zeyuan Liu, Jeonghye Kim, Xufang Luo (MSRA), Dongsheng Li (MSRA Shanghai, principal research manager), and Yuqing Yang.

Key Technical Concepts. The eleven failure modes documented in Agents of Chaos map directly to established vulnerability taxonomies: unauthorized compliance is CWE-285 (improper authorization), prompt injection with lateral movement echoes OWASP LLM01, and the circular agent-to-agent relay that ran for nine days consuming 60,000+ tokens is classic resource exhaustion (CWE-400). The fundamental structural cause is token indistinguishability — agents cannot separate instructions from data, a problem formalized in prior work on indirect prompt injection. EMPO² addresses the remediation layer through a dual-update paradigm combining parametric learning (weight updates via RL, extending GRPO) with non-parametric learning (external memory of self-generated "tips" from trajectory reflection), plus intrinsic rewards based on state novelty to encourage exploration in RL. The framework achieved 128.6% improvement over GRPO on ScienceWorld and demonstrated out-of-distribution adaptation within ten trials using memory alone — no weight updates required.

Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.

...more
View all episodesView all episodes
Download on the App Store

Daily Tech Feed: From the LabsBy Daily Tech Feed