AI Post Transformers

AI Agent Traps and Prompt Injection


Listen Later

This episode explores why AI agents become a fundamentally different security problem once language models can browse the web, read email, call tools, store memory, and act inside real software environments. It explains prompt injection as the core boundary failure, showing how webpages, emails, retrieved notes, or API responses can be mistaken for trusted instructions, turning ordinary content into an attack vector with real operational consequences. The discussion then sharpens the distinction between one-off prompt attacks and more systemic failures such as memory poisoning and multi-agent compromise, where corrupted state can persist across sessions or spread through delegated workflows. A listener would find it interesting because it frames agent safety as a concrete systems-security challenge, not just a model-behavior quirk, and clarifies why greater capability also widens the blast radius of failure.
Sources:
1. AI Agent Traps and Prompt Injection
/tmp/submission-source-_s144w4z.txt
2. Ignore Previous Prompt: Attack Techniques For Language Models — Fábio Perez, Ian Ribeiro, 2022
https://scholar.google.com/scholar?q=Ignore+Previous+Prompt:+Attack+Techniques+For+Language+Models
3. Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz, 2023
https://scholar.google.com/scholar?q=Not+what+you've+signed+up+for:+Compromising+Real-World+LLM-Integrated+Applications+with+Indirect+Prompt+Injection
4. Prompt Injection attack against LLM-integrated Applications — Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu, 2023
https://scholar.google.com/scholar?q=Prompt+Injection+attack+against+LLM-integrated+Applications
5. Prompt Injection Attacks and Defenses in LLM-Integrated Applications — Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong, 2023
https://scholar.google.com/scholar?q=Prompt+Injection+Attacks+and+Defenses+in+LLM-Integrated+Applications
6. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways — Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, Yang Xiang, 2024
https://scholar.google.com/scholar?q=AI+Agents+Under+Threat:+A+Survey+of+Key+Security+Challenges+and+Future+Pathways
7. Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents — Christian Schroeder de Witt, 2025
https://scholar.google.com/scholar?q=Open+Challenges+in+Multi-Agent+Security:+Towards+Secure+Systems+of+Interacting+AI+Agents
8. Red-Teaming LLM Multi-Agent Systems via Communication Attacks — Pengfei He, Yupin Lin, Shen Dong, Han Xu, Yue Xing, Hui Liu, 2025
https://scholar.google.com/scholar?q=Red-Teaming+LLM+Multi-Agent+Systems+via+Communication+Attacks
9. G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems — Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang, 2025
https://scholar.google.com/scholar?q=G-Safeguard:+A+Topology-Guided+Security+Lens+and+Treatment+on+LLM-based+Multi-agent+Systems
10. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents — Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang, 2024
https://scholar.google.com/scholar?q=InjecAgent:+Benchmarking+Indirect+Prompt+Injections+in+Tool-Integrated+Large+Language+Model+Agents
11. Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents — Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang, 2024
https://scholar.google.com/scholar?q=Agent+Security+Bench+(ASB):+Formalizing+and+Benchmarking+Attacks+and+Defenses+in+LLM-based+Agents
12. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases — Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li, 2024
https://scholar.google.com/scholar?q=AgentPoison:+Red-teaming+LLM+Agents+via+Poisoning+Memory+or+Knowledge+Bases
13. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems — Donghyun Lee, Mo Tiwari, 2024
https://scholar.google.com/scholar?q=Prompt+Infection:+LLM-to-LLM+Prompt+Injection+within+Multi-Agent+Systems
14. Multi-Agent Systems Execute Arbitrary Malicious Code — Harold Triedman, Rishi D. Jha, Vitaly Shmatikov, 2025
https://scholar.google.com/scholar?q=Multi-Agent+Systems+Execute+Arbitrary+Malicious+Code
15. Prompt Injection Attacks on Large Language Models: A Survey of Attack Methods, Root Causes, and Defense Strategies — approx. survey by prompt-injection/security researchers, 2025
https://scholar.google.com/scholar?q=Prompt+Injection+Attacks+on+Large+Language+Models:+A+Survey+of+Attack+Methods,+Root+Causes,+and+Defense+Strategies
16. Prompting for LLM Security and RAG: A Survey from Zero-Shot to Automatic Prompt Optimization (APO) and Prompt-Injection Defenses — approx. security/RAG survey authors, 2025
https://scholar.google.com/scholar?q=Prompting+for+LLM+Security+and+RAG:+A+Survey+from+Zero-Shot+to+Automatic+Prompt+Optimization+(APO)+and+Prompt-Injection+Defenses
17. Veriguard: Enhancing LLM Agent Safety via Verified Code Generation — approx. systems/security authors, 2025
https://scholar.google.com/scholar?q=Veriguard:+Enhancing+LLM+Agent+Safety+via+Verified+Code+Generation
18. Enforcement Agents: Enhancing Accountability and Resilience in Multi-Agent AI Frameworks — approx. multi-agent safety authors, 2025
https://scholar.google.com/scholar?q=Enforcement+Agents:+Enhancing+Accountability+and+Resilience+in+Multi-Agent+AI+Frameworks
19. Monitoring LLM-Based Multi-Agent Systems Against Corruptions via Node Evaluation — approx. multi-agent monitoring authors, 2025
https://scholar.google.com/scholar?q=Monitoring+LLM-Based+Multi-Agent+Systems+Against+Corruptions+via+Node+Evaluation
20. Enhancing Robustness of LLM-Driven Multi-Agent Systems Through Randomized Smoothing — approx. robustness/safety authors, 2025
https://scholar.google.com/scholar?q=Enhancing+Robustness+of+LLM-Driven+Multi-Agent+Systems+Through+Randomized+Smoothing
21. Assessing and Enhancing the Robustness of LLM-Based Multi-Agent Systems Through Chaos Engineering — approx. systems robustness authors, 2025
https://scholar.google.com/scholar?q=Assessing+and+Enhancing+the+Robustness+of+LLM-Based+Multi-Agent+Systems+Through+Chaos+Engineering
22. AI Post Transformers: Memory in the Age of AI Agents: Forms, Functions, Dynamics — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-16-memory-in-the-age-of-ai-agents-forms-fun-5abc60.mp3
23. AI Post Transformers: NeurIPS 2025: Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/neurips-2025-agentic-plan-caching-test-time-memory-for-fast-and-cost-efficient-l/
24. AI Post Transformers: ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory — Hal Turing & Dr. Ada Shannon, 2025
https://podcast.do-not-panic.com/episodes/reasoningbank-scaling-agent-self-evolving-with-reasoning-memory/
25. AI Post Transformers: Qwen3Guard: Streaming Three-Way Safety Classification for LLMs — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-16-qwen3guard-streaming-three-way-safety-cl-26b0ef.mp3
Interactive Visualization: AI Agent Traps and Prompt Injection
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof