Why Your AI Agent Won't Stop Working — and Each Model Falls for a Different Trap
Source: LoopTrap: Termination Poisoning Attacks on LLM Agents
Paper was published on May 07, 2026
This episode was AI-generated on May 9, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.
A new paper shows that one or two sentences hidden in a webpage can keep an AI agent grinding away for hours, silently running up the bill — and that each frontier model has its own distinct profile of which manipulations it falls for. The result is a kind of behavioral fingerprint for LLMs that has implications well beyond security, including how you should pick a model for any agent deployment.
Key Takeaways
Why termination — not output — is the real attack surface for agents, and how short, plausible-sounding injections can trap them in expensive reasoning loopsHow attacks inspired by cognitive biases (sunk cost, authority, recursive verification, positive reinforcement) translate into one or two-sentence prompts that work in the wildConcrete numbers: ~3.5x average slowdown across eight frontier models, peaks of 25x, and an 86% attack success rate at the 2x thresholdThe mirror-image vulnerability profiles of Kimi-K2-Thinking (folds to fake authority) and Claude Sonnet 4.5 (spirals into recursive verification), and what that suggests about model selectionWhy open-ended research tasks are far more exploitable than math and logic, where ground truth gives the agent a real stopping signalWhere the paper's lab numbers may overstate real-world risk, and where the cognitive-bias framing outruns what's actually been demonstrated00:00 — A new attack surface: when, not what
Why going after an agent's termination decision is fundamentally different from prompt injections aimed at outputs or tool calls.23:04 — The attack catalog
A walkthrough of the ten injection templates — positive reinforcement, authority override, recursive decomposition, sunk cost, and more — and what makes each one land.07:34 — Headline numbers across eight frontier models
The Step Amplification Factor results from 3,000 runs per model and what the 3.5x average and 25x peaks actually mean operationally.11:21 — Behavioral fingerprints and the Kimi vs. Claude contrast
How aggregating attack outcomes produced stable per-model personality profiles, with Kimi and Claude as near mirror images on authority and verification.15:09 — LoopTrap: fingerprinting and profile-guided attacks
The three-stage system that profiles a target agent for the cost of eight runs, then synthesizes task-grounded attacks tuned to its biases.18:56 — Why task type matters — math resists, history doesn't
The finding that objectively verifiable tasks blunt these attacks, while open-ended research tasks have no natural stopping point to defend.22:43 — Skeptical read: what the paper does and doesn't show
Four concerns about simulated tools, the 2x success threshold, the cognitive-bias framing, and the absence of defense evaluation.26:31 — Implications for builders and where the research goes next
Why behavioral profiles should inform model selection, and why durable defenses likely require external loop structure rather than fixing the model itself.Recommended Reading
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — The foundational paper on indirect prompt injection — the threat model LoopTrap repurposes from output corruption to termination corruption.ReAct: Synergizing Reasoning and Acting in Language Models — The think-act-observe loop that the episode describes as the core surface termination poisoning attacks — worth reading to understand exactly where the 'am I done?' decision lives.Reflexion: Language Agents with Verbal Reinforcement Learning — The self-critique mechanism LoopTrap's stage-two attack synthesizer borrows to steer away from failed attacks — useful context for how the same technique cuts both ways.GAIA: A Benchmark for General AI Assistants — The multi-step task benchmark LoopTrap draws its sixty evaluation tasks from, including the open-ended research questions the episode flags as most vulnerable.