
Sign up to save your podcasts
Or


Why it matters. The economics of vulnerability discovery just broke. In twenty minutes, Claude Opus 4.6 found a novel use-after-free memory bug in Firefox — one of the most audited codebases on the internet, backed by millions of CPU hours of continuous fuzzing. That single result is a waypoint on a documented curve: from GPT-4 exploiting 87% of known one-day vulnerabilities with 91 lines of LangChain code in 2024, to Anthropic's red team finding 500+ high-severity zero-days in well-maintained open-source software in early 2026, to a live collaboration with Mozilla that found 22 Firefox vulnerabilities in two weeks. We are in a window: finding is democratized, reliable exploitation still has friction. This episode documents the curve, names what's most at risk, and argues for what defenders must do before the gap closes.
University of Illinois Urbana-Champaign. The baseline was set in April 2024 by researchers at UIUC in arXiv:2404.08144. Daniel Kang and colleagues wrapped GPT-4 in 91 lines of code and pointed it at fifteen real-world systems with known critical vulnerabilities. GPT-4 successfully exploited 87% when given the CVE description — and 7% without it. Every other model tested, including GPT-3.5 and every open-source model available at the time, scored zero percent. The gap between GPT-4 and the field was absolute. Two years later, that gap is the story.
The Researchers. Daniel Kang leads the work from UIUC's systems and security group. The paper established the quantitative baseline that every subsequent AI vulnerability paper has been measured against, and it remains the foundational citation for understanding how fast the offense side is moving.
Key Technical Concepts. The paper's core insight: the model had the capability to exploit, but needed knowledge to aim it. The 80-percentage-point gap between having the CVE description versus not reveals that vulnerability exploitation is a knowledge problem as much as a reasoning problem — which is precisely why the subsequent shift to zero-day discovery (no CVE, no description) represents such a qualitative jump. The agent architecture used LangChain for tool orchestration and required no custom scaffolding beyond a standard ReAct loop.
Anthropic. On February 5, 2026, Anthropic's red team published "0-Days" — a documented account of Claude Opus 4.6 finding over 500 high-severity zero-day vulnerabilities in open-source codebases that had already accumulated millions of CPU hours of continuous fuzzing. The bugs had gone undetected for decades. The model found them out of the box, with no custom scaffolding, by reasoning about code the way a human researcher would: reading past fixes for similar bugs, spotting patterns that tend to cause problems, constructing the exact input required to trigger failure.
The Researchers. Nicholas Carlini and Keane Lucas led the red team effort at Anthropic. Carlini is one of the field's most cited adversarial ML researchers; his prior work includes foundational attacks on neural networks and cryptographic applications of deep learning. The paper represents a practical operationalization of capabilities that the security community had theorized about but not documented at this scale.
Key Technical Concepts. The qualitative shift from one-day to zero-day exploitation matters enormously. Fuzzing — the dominant industrial technique for finding memory bugs — works by throwing random inputs at code to see what breaks. It is brute force, and it has been running for years on these codebases. LLM-based auditing instead reasons about program logic: understanding control flow, tracing data through memory allocations, identifying use-after-free and buffer overflow patterns by semantic understanding rather than probabilistic input mutation. The bugs fuzzers miss are structurally different from the bugs LLMs miss — and they're the same class of bugs that memory-safe languages like Rust eliminate architecturally.
Mozilla. In March 2026, Anthropic and Mozilla announced a formal collaboration: Claude Opus 4.6 auditing the Firefox codebase. The result — 22 vulnerabilities in two weeks, 14 classified high-severity — represents almost a fifth of all high-severity Firefox vulnerabilities remediated across the entirety of 2025. The model scanned approximately 6,000 C++ files and submitted 112 unique reports. Mozilla's own blog post documents the coordinated disclosure process. The exploitation attempt cost $4,000 in API credits and succeeded in two cases, both requiring a sandboxed environment with security features disabled.
The Researchers. The collaboration involved Anthropic's red team alongside Mozilla's security engineering group. Mozilla maintains one of the most mature continuous fuzzing programs in open source — the OSS-Fuzz integration for Firefox has been running for years — making this an unusually high baseline against which to measure AI-assisted discovery.
Key Technical Concepts. Use-after-free vulnerabilities — the class of bug found in the first twenty minutes — occur when code continues to access memory after it has been freed, allowing an attacker to overwrite arbitrary data. They are the dominant class of critical memory safety bugs in C++ and the reason browser vendors have invested heavily in sandboxing, control flow integrity, and memory tagging. The Firefox result demonstrates that LLM-assisted auditing can find this class at scale in a codebase where human-expert and automated methods have been running continuously for over a decade.
ICLR 2026. ZeroDayBench, presented at the ICLR 2026 "Agents in the Wild" workshop, establishes the first structured benchmark for evaluating LLM agents on finding and patching 22 novel critical vulnerabilities in open-source software. Models tested include frontier systems from OpenAI, Anthropic, and xAI. The benchmark's finding: frontier models are "not yet capable of autonomously solving" the hardest tasks at full scale.
The Researchers. The ZeroDayBench authors represent the growing subfield of AI-assisted security research that has emerged specifically in response to the capability jumps documented by the Carlini/Lucas and Fang papers. The benchmark itself is a signal: the field is moving fast enough that a structured evaluation framework was needed.
Key Technical Concepts. ZeroDayBench is the honest counterpoint this episode earns credibility by naming. Fully autonomous zero-day discovery at scale — finding and reliably weaponizing bugs without human involvement — remains hard. The benchmark quantifies exactly where the frontier is, which makes it more useful than either dismissal or alarm. Related evaluation frameworks include CyberSecEval from Meta and InterCode from Princeton.
Google. The Google Threat Intelligence Group's 2025 Zero-Day Review counted 90 zero-days exploited in the wild in 2025, up from 78 in 2024. Forty-eight percent targeted enterprise software — an all-time high. Commercial surveillance vendors now attribute more zero-days than traditional nation-states. The 2026 forecast explicitly flags AI acceleration as a factor. GTIG tracks zero-day exploitation across all major platform vendors; their annual review is the primary empirical baseline for real-world exploitation trends.
The Researchers. GTIG's analysis draws on Google's threat intelligence corpus, including Project Zero disclosures, Mandiant incident response data, and public CVE records. Project Zero's 0-day "In the Wild" tracking spreadsheet is the most granular public dataset of exploited zero-days.
Key Technical Concepts. The enterprise software targeting shift matters for defenders: security appliances, VPN concentrators, and network edge devices are now the preferred entry point for state actors and commercial surveillance vendors alike, precisely because they sit outside EDR coverage and process untrusted traffic by design. The same reasoning capability that found Firefox bugs applies equally to SCADA and industrial control software — older, less-audited codebases with smaller security teams and longer patch cycles.
February 2026. Co-RedTeam documents multi-agent orchestration for security discovery and exploitation — not a single model but coordinated networks of agents handling reconnaissance, vulnerability identification, exploit development, and verification as a pipeline. The paper represents the industrialization phase: the capability is being systematized, productized, and made accessible to actors without deep security expertise.
The Researchers. The Co-RedTeam authors are part of a growing literature on agentic security systems that has expanded rapidly since the Fang et al. 2024 baseline. Related work includes PentestGPT and HackingBuddyGPT, both of which demonstrated LLM-assisted penetration testing at earlier capability levels.
Key Technical Concepts. Multi-agent architectures lower the barrier further by decomposing the vulnerability pipeline into specialized sub-agents: one reasons about code to find candidate bugs, another constructs proof-of-concept exploits, a third verifies impact. This mirrors how human red teams work — divide by expertise — and it means the capability no longer requires a single model to be good at everything simultaneously. The coordination layer is agentic scaffolding of the same kind used in software engineering agents like SWE-agent.
Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.
By Daily Tech FeedWhy it matters. The economics of vulnerability discovery just broke. In twenty minutes, Claude Opus 4.6 found a novel use-after-free memory bug in Firefox — one of the most audited codebases on the internet, backed by millions of CPU hours of continuous fuzzing. That single result is a waypoint on a documented curve: from GPT-4 exploiting 87% of known one-day vulnerabilities with 91 lines of LangChain code in 2024, to Anthropic's red team finding 500+ high-severity zero-days in well-maintained open-source software in early 2026, to a live collaboration with Mozilla that found 22 Firefox vulnerabilities in two weeks. We are in a window: finding is democratized, reliable exploitation still has friction. This episode documents the curve, names what's most at risk, and argues for what defenders must do before the gap closes.
University of Illinois Urbana-Champaign. The baseline was set in April 2024 by researchers at UIUC in arXiv:2404.08144. Daniel Kang and colleagues wrapped GPT-4 in 91 lines of code and pointed it at fifteen real-world systems with known critical vulnerabilities. GPT-4 successfully exploited 87% when given the CVE description — and 7% without it. Every other model tested, including GPT-3.5 and every open-source model available at the time, scored zero percent. The gap between GPT-4 and the field was absolute. Two years later, that gap is the story.
The Researchers. Daniel Kang leads the work from UIUC's systems and security group. The paper established the quantitative baseline that every subsequent AI vulnerability paper has been measured against, and it remains the foundational citation for understanding how fast the offense side is moving.
Key Technical Concepts. The paper's core insight: the model had the capability to exploit, but needed knowledge to aim it. The 80-percentage-point gap between having the CVE description versus not reveals that vulnerability exploitation is a knowledge problem as much as a reasoning problem — which is precisely why the subsequent shift to zero-day discovery (no CVE, no description) represents such a qualitative jump. The agent architecture used LangChain for tool orchestration and required no custom scaffolding beyond a standard ReAct loop.
Anthropic. On February 5, 2026, Anthropic's red team published "0-Days" — a documented account of Claude Opus 4.6 finding over 500 high-severity zero-day vulnerabilities in open-source codebases that had already accumulated millions of CPU hours of continuous fuzzing. The bugs had gone undetected for decades. The model found them out of the box, with no custom scaffolding, by reasoning about code the way a human researcher would: reading past fixes for similar bugs, spotting patterns that tend to cause problems, constructing the exact input required to trigger failure.
The Researchers. Nicholas Carlini and Keane Lucas led the red team effort at Anthropic. Carlini is one of the field's most cited adversarial ML researchers; his prior work includes foundational attacks on neural networks and cryptographic applications of deep learning. The paper represents a practical operationalization of capabilities that the security community had theorized about but not documented at this scale.
Key Technical Concepts. The qualitative shift from one-day to zero-day exploitation matters enormously. Fuzzing — the dominant industrial technique for finding memory bugs — works by throwing random inputs at code to see what breaks. It is brute force, and it has been running for years on these codebases. LLM-based auditing instead reasons about program logic: understanding control flow, tracing data through memory allocations, identifying use-after-free and buffer overflow patterns by semantic understanding rather than probabilistic input mutation. The bugs fuzzers miss are structurally different from the bugs LLMs miss — and they're the same class of bugs that memory-safe languages like Rust eliminate architecturally.
Mozilla. In March 2026, Anthropic and Mozilla announced a formal collaboration: Claude Opus 4.6 auditing the Firefox codebase. The result — 22 vulnerabilities in two weeks, 14 classified high-severity — represents almost a fifth of all high-severity Firefox vulnerabilities remediated across the entirety of 2025. The model scanned approximately 6,000 C++ files and submitted 112 unique reports. Mozilla's own blog post documents the coordinated disclosure process. The exploitation attempt cost $4,000 in API credits and succeeded in two cases, both requiring a sandboxed environment with security features disabled.
The Researchers. The collaboration involved Anthropic's red team alongside Mozilla's security engineering group. Mozilla maintains one of the most mature continuous fuzzing programs in open source — the OSS-Fuzz integration for Firefox has been running for years — making this an unusually high baseline against which to measure AI-assisted discovery.
Key Technical Concepts. Use-after-free vulnerabilities — the class of bug found in the first twenty minutes — occur when code continues to access memory after it has been freed, allowing an attacker to overwrite arbitrary data. They are the dominant class of critical memory safety bugs in C++ and the reason browser vendors have invested heavily in sandboxing, control flow integrity, and memory tagging. The Firefox result demonstrates that LLM-assisted auditing can find this class at scale in a codebase where human-expert and automated methods have been running continuously for over a decade.
ICLR 2026. ZeroDayBench, presented at the ICLR 2026 "Agents in the Wild" workshop, establishes the first structured benchmark for evaluating LLM agents on finding and patching 22 novel critical vulnerabilities in open-source software. Models tested include frontier systems from OpenAI, Anthropic, and xAI. The benchmark's finding: frontier models are "not yet capable of autonomously solving" the hardest tasks at full scale.
The Researchers. The ZeroDayBench authors represent the growing subfield of AI-assisted security research that has emerged specifically in response to the capability jumps documented by the Carlini/Lucas and Fang papers. The benchmark itself is a signal: the field is moving fast enough that a structured evaluation framework was needed.
Key Technical Concepts. ZeroDayBench is the honest counterpoint this episode earns credibility by naming. Fully autonomous zero-day discovery at scale — finding and reliably weaponizing bugs without human involvement — remains hard. The benchmark quantifies exactly where the frontier is, which makes it more useful than either dismissal or alarm. Related evaluation frameworks include CyberSecEval from Meta and InterCode from Princeton.
Google. The Google Threat Intelligence Group's 2025 Zero-Day Review counted 90 zero-days exploited in the wild in 2025, up from 78 in 2024. Forty-eight percent targeted enterprise software — an all-time high. Commercial surveillance vendors now attribute more zero-days than traditional nation-states. The 2026 forecast explicitly flags AI acceleration as a factor. GTIG tracks zero-day exploitation across all major platform vendors; their annual review is the primary empirical baseline for real-world exploitation trends.
The Researchers. GTIG's analysis draws on Google's threat intelligence corpus, including Project Zero disclosures, Mandiant incident response data, and public CVE records. Project Zero's 0-day "In the Wild" tracking spreadsheet is the most granular public dataset of exploited zero-days.
Key Technical Concepts. The enterprise software targeting shift matters for defenders: security appliances, VPN concentrators, and network edge devices are now the preferred entry point for state actors and commercial surveillance vendors alike, precisely because they sit outside EDR coverage and process untrusted traffic by design. The same reasoning capability that found Firefox bugs applies equally to SCADA and industrial control software — older, less-audited codebases with smaller security teams and longer patch cycles.
February 2026. Co-RedTeam documents multi-agent orchestration for security discovery and exploitation — not a single model but coordinated networks of agents handling reconnaissance, vulnerability identification, exploit development, and verification as a pipeline. The paper represents the industrialization phase: the capability is being systematized, productized, and made accessible to actors without deep security expertise.
The Researchers. The Co-RedTeam authors are part of a growing literature on agentic security systems that has expanded rapidly since the Fang et al. 2024 baseline. Related work includes PentestGPT and HackingBuddyGPT, both of which demonstrated LLM-assisted penetration testing at earlier capability levels.
Key Technical Concepts. Multi-agent architectures lower the barrier further by decomposing the vulnerability pipeline into specialized sub-agents: one reasons about code to find candidate bugs, another constructs proof-of-concept exploits, a third verifies impact. This mirrors how human red teams work — divide by expertise — and it means the capability no longer requires a single model to be good at everything simultaneously. The coordination layer is agentic scaffolding of the same kind used in software engineering agents like SWE-agent.
Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.