AI Papers: A Deep Dive

How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack


Listen Later

How Uber Caught 206 Leaked Credentials With an LLM-Powered Security Stack

Source: ADR: An Agentic Detection System for Enterprise Agentic AI Security

Paper was published on May 17, 2026

This episode was AI-generated on May 19, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

When a developer's AI assistant reads a poisoned Jira ticket and quietly exfiltrates SSH keys, traditional endpoint security sees nothing wrong. A new paper from Uber describes the first real production deployment of LLM-based security monitoring for AI agents — running across 7,200 hosts for ten months — and the architecture it lands on may become the template for how enterprises defend against agentic threats.

Key Takeaways
  • Why endpoint security tools are structurally blind to AI agent attacks — the 'semantic gap' between a syscall and the prompt that caused it
  • How ADR mirrors a human Security Operations Center with four tiers: a lightweight Sensor, a cheap Tier 1 triage LLM, a Tier 2 investigator that reads tool source code, and an offline evolutionary red team
  • Why reading the tool's actual source code matters more than reading its description — the 'tool rug pull' problem and a 10-point recall hit when you remove it
  • The standout production result: a simple regex-and-entropy prevention layer that caught 206 leaked credentials with only 6 false positives
  • Where the paper's claims weaken under scrutiny: 67% recall, a 49% production false positive rate, a self-built benchmark, and no ablation on the evolutionary red team
  • The emerging third threat category beyond external attackers and malicious insiders: trusted agents that can be talked into things
    • 00:00 — The Agent Flayer attack and the semantic gap
      A walkthrough of how a poisoned Jira ticket can hijack an AI coding assistant, and why traditional endpoint detection cannot see the attack at all.
    • 03:08 — MCP and the explosion of agent attack surface
      How the Model Context Protocol turned AI assistants into something that can touch thousands of real enterprise systems, and why that reshapes the defender's problem.
    • 06:16 — The SOC analogy and ADR's four-part architecture
      How the paper mirrors a human Security Operations Center with a Sensor, Tier 1 triage, Tier 2 investigation, and an offline Explorer.
    • 09:25 — Why the Sensor lives on the endpoint, not the network
      The design decision to parse local agent session logs instead of intercepting traffic at a gateway, and what that buys in forensic visibility.
    • 12:33 — Tier 2 as an MCP client that reads source code
      How the senior investigator agent pulls context on demand — including the actual implementation of tools — and why that one capability drives most of the detection quality.
    • 15:42 — The Explorer: selectively breeding worst-case attacks
      How an evolutionary red-teaming loop generates, mutates, and scores synthetic attacks offline to inoculate the production detector against attacks it has never seen.
    • 18:50 — The credential prevention result
      How detection data led to a simple regex-based prevention layer that blocked 206 credentials from leaving the company with only 6 false positives.
    • 21:59 — Benchmark numbers and the honest steelman
      Where ADR beats baselines two-to-four-fold, and where the 67% recall, 49% production false positive rate, self-built benchmark, and missing Explorer ablation deserve scrutiny.
    • 25:07 — The bigger picture: a new threat category
      Why AI security is fundamentally a semantic problem, and why 'trusted insiders being talked into things' is a threat model that didn't operationally exist two years ago.
    • Recommended Reading
      • Model Context Protocol Specification — The Anthropic-introduced standard that the episode identifies as the source of the expanded attack surface ADR is built to defend.
      • AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents — The third-party benchmark ADR is evaluated on — useful for the listener who wants to understand how prompt injection against tool-using agents is measured outside the authors' own benchmark.
      • Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — The foundational paper on indirect prompt injection — the exact attack class the episode's opening Agent Flayer scenario exemplifies.
      • ...more
        View all episodesView all episodes
        Download on the App Store

        AI Papers: A Deep DiveBy paperdive.ai