When Smarter Agents Get Fooled by Three Extra Nodes in a Database
Source: Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Paper was published on May 10, 2026
This episode was AI-generated on May 12, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.
Nine frontier models, three providers, 269 trials — and every single time, the agent trusted a lie planted in its knowledge graph by an attacker who added just three nodes. A new paper defines a new attack class called Oracle Poisoning, and along the way uncovers a methodological problem that may mean a chunk of existing AI safety evaluation has been measuring the wrong thing.
Key Takeaways
Why Oracle Poisoning is genuinely distinct from prompt injection, RAG poisoning, training-data poisoning, and tool poisoning — and why that distinction mattersThe delivery-mode finding: the same model rejects poisoned data inline but trusts it 100% when it arrives through a real SDK tool call, with implications for how every agentic safety evaluation is runWhy system prompt hardening has zero measurable effect against this attack — and which defenses (read-only access, multi-tool cross-verification) actually workThe asymmetry that makes this cheap: corrupting the knowledge graph that describes a codebase is dramatically easier than corrupting the codebase itselfThe unsettling hypothesis that more capable reasoning may increase susceptibility, not reduce it, because better reasoners produce more confident wrong answers from corrupted premisesWhere the paper's claims are strongest and where they reach — including the single-system empirical base and the missing human baseline00:00 — The Plato's Cave framing and why reasoning quality isn't epistemic security
Setting up the core thesis that a better reasoner working from corrupted facts is no less wrong — just more convincingly wrong.03:23 — What Oracle Poisoning is, and what it isn't
Walking through how the attack differs from prompt injection, RAG poisoning, training-data poisoning, and tool poisoning.06:47 — The fake sanitizer attack in concrete detail
How adding three nodes to a 42-million-node graph flips an agent's SQL injection verdict — and how the agent rationalizes away disconfirming evidence.10:11 — The economics: why the map is less defended than the territory
Why modifying the knowledge graph that describes code is dramatically cheaper than modifying the code itself.13:34 — The empirical result: 269 out of 269
The cross-model evaluation across nine frontier models and the step-function jump from L1 to L2 attacker sophistication.16:58 — The delivery-mode discovery
Why the same content is rejected inline but trusted through a real tool call — and what this means for the validity of existing safety evaluations.20:22 — Steelman: where the paper's claims reach
The directed-prompt dependency, the single production system tested, and the missing human baseline.23:46 — What defenses actually work, and which famously don't
Read-only access and multi-tool cross-verification work; generic skepticism prompts and system prompt hardening do not.27:09 — The frame shift: risk moves from the model to its environment
Why agentic AI safety increasingly depends on the integrity of tools and data channels, not the model's reasoning.Recommended Reading
Model Context Protocol Specification — The official specification for the tool-call channel the episode identifies as the trusted delivery pathway exploited by Oracle Poisoning.Prompt Injection attacks against GPT-3 (Simon Willison) — The original framing of prompt injection that the episode explicitly distinguishes Oracle Poisoning from — useful context for understanding what makes the new attack class structurally different.PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models — A closely related attack on retrieval-augmented systems that the episode contrasts with Oracle Poisoning — useful for seeing how data-source corruption differs when the channel is RAG versus structured tool calls.