Learning GenAI via SOTA Papers

EP160: [AgentSys] Securing AI agents with hierarchical memory


Listen Later

The paper introduces AGENTSYS, a novel framework designed to protect Large Language Model (LLM) agents from indirect prompt injection (IPI) attacks through explicit hierarchical memory management. Conventional LLM agents are vulnerable because they indiscriminately accumulate all tool outputs and reasoning traces in their context window, allowing malicious instructions to persist across multiple reasoning steps and degrading decision-making through verbose, non-essential content.

Key features of the AGENTSYS architecture include:

  • Hierarchical Isolation: The system organizes agents into a tree structure where a main agent spawns short-lived worker agents for tool invocations.
  • Memory Management: Raw external data and subtask reasoning traces are confined to isolated worker contexts and never enter the main agent's memory.
  • Schema-Validated Communication: The main agent defines a specific "intent" (a JSON-like schema) for each tool call, and worker agents distill raw outputs into compact, validated return values that must pass a syntactic gate.
  • Mediated Recursion: Any recursive tool calls within subtasks are gated by an LLM-based validator and a sanitize-restart mechanism to handle potentially adversarial content.

Evaluations on benchmarks like AgentDojo and ASB show that AGENTSYS achieves state-of-the-art security, reaching a 0.78% attack success rate (ASR) on AgentDojo while improving benign utility (64.36% compared to 63.54% for undefended baselines). By keeping the main agent's working memory clean and focused, AGENTSYS effectively prevents attack persistence and utility degradation in complex, multi-step workflows.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu