ToxSec - AI and Cybersecurity Podcast

Distillation Raids, Slopsquatting, and the Agent Trap


Listen Later

TL;DR: Cloudflare blocks 230 billion threats per day and just dropped the receipts. Bots are running 94% of all login attempts. Attackers are measuring ROI per exploit. And the three attack vectors nobody’s patching: model distillation raids, slopsquatting, and indirect prompt injection, are carving through the AI stack wide open.

This is the public feed. Upgrade to see what doesn’t make it out.

The Internet Runs on Robots Now and They’re Mostly Hostile

Cloudflare sits in front of roughly 20% of global web traffic, which makes their threat data as close to ground truth as we get. Their Cloudforce One team just published the inaugural 2026 Threat Report, and the headline stat ruins your morning: 94% of all login attempts come from bots. Automated scripts, running 24/7. Of all login attempts, bot and human combined, 63% involve credentials already compromised elsewhere

The bigger finding: attackers have stopped chasing complexity. They run ROI calculations now. Why spend $200K on a zero-day when a stolen session token gets the same access for free? Three AI attack chains are delivering the best returns right now. Here’s how each one works.

Distillation Raids: 16 Million Stolen Conversations

Quick concept: a large AI model costs billions and years to train. Distillation is the shortcut — you feed a smaller model the outputs of the big one until it starts mimicking it. Legit labs do this internally. The attack version skips the R&D bill entirely.

Anthropic just named three Chinese labs — DeepSeek, Moonshot AI, and MiniMax — for running this against Claude. The numbers: 24,000 fraudulent accounts, over 16 million total exchanges, coordinated to dodge rate limiting. DeepSeek’s technique was sharp: their accounts asked Claude to walk through its own reasoning step by step, generating chain-of-thought data — transcripts of how Claude thinks, not just what it says. Premium training material. Anthropic traced them through traffic patterns, payment metadata, and canary tokens: unique strings planted in training data specifically to fingerprint unauthorized extraction.

The real problem isn’t the IP theft. When you distill a model by extraction, the safety guardrails don’t survive the copy. The raw capability does. That stripped-down version is exactly what you want for offensive operations, and Anthropic says that’s where some of this is headed.

Your Agent Got Owned While Summarizing a Blog Post

If you use an AI agent — any tool that browses the web, reads documents, and takes actions on your behalf — this applies to you.

Prompt injection is slipping malicious instructions into an AI’s input. The direct version, you’re talking to the AI and you sneak in the payload. Indirect is sneakier: attackers seed instructions into web content and wait for an agent to find them. No targeting required.

The specific surface getting hit right now is URL summarization. Agents do this constantly. Attackers embed hidden commands inside articles and landing pages, formatted to look like a new instruction from you. The AI reads the page, hits the injected text, and can’t distinguish “content I’m processing” from “orders from my operator.” It obeys. Your agent forwards session data or exfils credentials while you’re looking at a clean summary on your screen.

Slopsquatting: The Vibe Coder Tax

Vibe coding is letting an AI write your software while you describe what you want. Fast, popular, and it has a failure mode attackers are already monetizing.

AI coding tools hallucinate package names. A package is a pre-built code library your project pulls in rather than writing from scratch. When your AI writes code that needs one, it sometimes invents a name that sounds real but doesn’t exist. A 2025 study across 576,000 generated code samples found this happens roughly 20% of the time. The critical detail: 43% of hallucinated names repeat consistently. That makes them predictable, and predictable means registerable.

The proof is live. A Lasso Security researcher found LLMs consistently hallucinated huggingface-cli as a Python package. She registered it with nothing inside and logged 30,000 downloads in three months — 30,000 developers who ran pip install huggingface-cli because their AI said to. A separate researcher found react-codeshift already referenced across 237 GitHub repositories before anyone claimed it. He got there first. Next time, an attacker will.

When an agent auto-installs dependencies mid-session, the whole chain runs with no human in the loop. The AI hallucinates a name, calls the package manager, and executes whatever the attacker uploaded. No social engineering. The model lies, and the lie was pre-registered.

The Math Doesn’t Lie

All three of these attacks share the same root cause: AI systems extend trust they haven’t earned. APIs trust high-volume requests. Agents trust the content they read. Package managers trust whatever the model asks for. None of these are theoretical. All three are running in production right now. The question isn’t whether your stack got hit. It’s whether your logging is good enough to find out.

Paid unlocks the unfiltered version: complete archive, private Q&As, and early drops.

Frequently Asked Questions

What is slopsquatting in AI generated code?

AI coding tools hallucinate package names roughly 20% of the time, and 43% of those fake names repeat predictably. Slopsquatting is when an attacker registers those phantom packages on PyPI or npm before anyone notices, then loads them with whatever payload they want. Your AI says pip install something that doesn’t exist yet, and the attacker already owns the name. One researcher logged 30,000 downloads in three months on a single hallucinated package with nothing inside it.

How do prompt injection attacks work on AI agents?

An attacker seeds hidden instructions into a webpage or document, then waits for your AI agent to read it. The agent can’t tell the difference between content it’s summarizing and orders from its operator, so it obeys the injected text. While you’re looking at a clean summary on your screen, the agent is quietly forwarding session tokens or credentials to wherever the payload told it to.

What is AI model distillation theft?

Distillation is how you train a cheap model by feeding it outputs from an expensive one. The attack version skips the billion-dollar R&D bill entirely: spin up thousands of fake accounts, bombard the target API with carefully structured prompts, and harvest the reasoning traces. Anthropic just caught three labs running exactly this against Claude with 24,000 accounts and 16 million exchanges. The kicker is that safety guardrails don’t survive the copy, so what comes out the other side is raw capability with no safety layer.

ToxSec is run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering. He covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand.



Get full access to ToxSec - AI and Cybersecurity at www.toxsec.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

ToxSec - AI and Cybersecurity PodcastBy ToxSec