
Sign up to save your podcasts
Or


The paper introduces AGENTSYS, a novel framework designed to protect Large Language Model (LLM) agents from indirect prompt injection (IPI) attacks through explicit hierarchical memory management. Conventional LLM agents are vulnerable because they indiscriminately accumulate all tool outputs and reasoning traces in their context window, allowing malicious instructions to persist across multiple reasoning steps and degrading decision-making through verbose, non-essential content.
Key features of the AGENTSYS architecture include:
Evaluations on benchmarks like AgentDojo and ASB show that AGENTSYS achieves state-of-the-art security, reaching a 0.78% attack success rate (ASR) on AgentDojo while improving benign utility (64.36% compared to 63.54% for undefended baselines). By keeping the main agent's working memory clean and focused, AGENTSYS effectively prevents attack persistence and utility degradation in complex, multi-step workflows.
By Yun WuThe paper introduces AGENTSYS, a novel framework designed to protect Large Language Model (LLM) agents from indirect prompt injection (IPI) attacks through explicit hierarchical memory management. Conventional LLM agents are vulnerable because they indiscriminately accumulate all tool outputs and reasoning traces in their context window, allowing malicious instructions to persist across multiple reasoning steps and degrading decision-making through verbose, non-essential content.
Key features of the AGENTSYS architecture include:
Evaluations on benchmarks like AgentDojo and ASB show that AGENTSYS achieves state-of-the-art security, reaching a 0.78% attack success rate (ASR) on AgentDojo while improving benign utility (64.36% compared to 63.54% for undefended baselines). By keeping the main agent's working memory clean and focused, AGENTSYS effectively prevents attack persistence and utility degradation in complex, multi-step workflows.