May 14, 2026

When Smarter Agents Get Fooled by Three Extra Nodes in a Database

30 minutes

Source: Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Paper was published on May 10, 2026

This episode was AI-generated on May 12, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Nine frontier models, three providers, 269 trials — and every single time, the agent trusted a lie planted in its knowledge graph by an attacker who added just three nodes. A new paper defines a new attack class called Oracle Poisoning, and along the way uncovers a methodological problem that may mean a chunk of existing AI safety evaluation has been measuring the wrong thing.

Key Takeaways

Why Oracle Poisoning is genuinely distinct from prompt injection, RAG poisoning, training-data poisoning, and tool poisoning — and why that distinction matters

The delivery-mode finding: the same model rejects poisoned data inline but trusts it 100% when it arrives through a real SDK tool call, with implications for how every agentic safety evaluation is run

Why system prompt hardening has zero measurable effect against this attack — and which defenses (read-only access, multi-tool cross-verification) actually work

The asymmetry that makes this cheap: corrupting the knowledge graph that describes a codebase is dramatically easier than corrupting the codebase itself

The unsettling hypothesis that more capable reasoning may increase susceptibility, not reduce it, because better reasoners produce more confident wrong answers from corrupted premises

Where the paper's claims are strongest and where they reach — including the single-system empirical base and the missing human baseline

00:00 — The Plato's Cave framing and why reasoning quality isn't epistemic security
Setting up the core thesis that a better reasoner working from corrupted facts is no less wrong — just more convincingly wrong.

03:23 — What Oracle Poisoning is, and what it isn't
Walking through how the attack differs from prompt injection, RAG poisoning, training-data poisoning, and tool poisoning.

06:47 — The fake sanitizer attack in concrete detail
How adding three nodes to a 42-million-node graph flips an agent's SQL injection verdict — and how the agent rationalizes away disconfirming evidence.

10:11 — The economics: why the map is less defended than the territory
Why modifying the knowledge graph that describes code is dramatically cheaper than modifying the code itself.

13:34 — The empirical result: 269 out of 269
The cross-model evaluation across nine frontier models and the step-function jump from L1 to L2 attacker sophistication.

16:58 — The delivery-mode discovery
Why the same content is rejected inline but trusted through a real tool call — and what this means for the validity of existing safety evaluations.

20:22 — Steelman: where the paper's claims reach
The directed-prompt dependency, the single production system tested, and the missing human baseline.

23:46 — What defenses actually work, and which famously don't
Read-only access and multi-tool cross-verification work; generic skepticism prompts and system prompt hardening do not.

27:09 — The frame shift: risk moves from the model to its environment
Why agentic AI safety increasingly depends on the integrity of tools and data channels, not the model's reasoning.