
Sign up to save your podcasts
Or


Paper Link: https://arxiv.org/abs/2603.11088
Summary:
This paper presents the first systematic and comprehensive survey of AI agent security, addressing the unique challenges created by hybrid systems that combine large language models (LLMs) with traditional software components. The authors introduce a foundational framework to understand the security landscape, focusing on three primary areas: design dimensions, attack vectors, and defense mechanisms.
Key aspects of the paper's systematization include:
• Design Dimensions: The survey identifies seven key design dimensions—input trust, access sensitivity, workflow, action, memory, tool, and user interface—analyzing how increased flexibility in these areas broadens an agent's attack surface.
• Attack Taxonomy: The authors categorize attacks based on three threat models (external, user-level, and internal adversaries) and identify seven specific security risks (R1–R7), such as indirect prompt injection, private data leakage, and unauthorized actions.
• Defense Landscape: The paper surveys existing defense strategies, categorizing them into runtime protection (e.g., guardrails, monitoring), secure-by-design (e.g., privilege separation), identity and access management, and component hardening.
• Case Studies: To highlight existing security gaps, the authors conduct case studies on real-world coding and web agents, including a detailed analysis of AutoGPT vulnerabilities like command injection and path traversal.
Ultimately, the work serves as a handbook for researchers and developers, pointing out that while progress has been made in mapping the problem space, practical and adaptive defenses remain largely elusive.
By Yun WuPaper Link: https://arxiv.org/abs/2603.11088
Summary:
This paper presents the first systematic and comprehensive survey of AI agent security, addressing the unique challenges created by hybrid systems that combine large language models (LLMs) with traditional software components. The authors introduce a foundational framework to understand the security landscape, focusing on three primary areas: design dimensions, attack vectors, and defense mechanisms.
Key aspects of the paper's systematization include:
• Design Dimensions: The survey identifies seven key design dimensions—input trust, access sensitivity, workflow, action, memory, tool, and user interface—analyzing how increased flexibility in these areas broadens an agent's attack surface.
• Attack Taxonomy: The authors categorize attacks based on three threat models (external, user-level, and internal adversaries) and identify seven specific security risks (R1–R7), such as indirect prompt injection, private data leakage, and unauthorized actions.
• Defense Landscape: The paper surveys existing defense strategies, categorizing them into runtime protection (e.g., guardrails, monitoring), secure-by-design (e.g., privilege separation), identity and access management, and component hardening.
• Case Studies: To highlight existing security gaps, the authors conduct case studies on real-world coding and web agents, including a detailed analysis of AutoGPT vulnerabilities like command injection and path traversal.
Ultimately, the work serves as a handbook for researchers and developers, pointing out that while progress has been made in mapping the problem space, practical and adaptive defenses remain largely elusive.