The Adversarial Testing Podcast

By Damian

Data-driven deep dives into AI adversarial testing, safety, and evaluation.... more

· Technology

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about The Adversarial Testing Podcast:

How many episodes does The Adversarial Testing Podcast have?

The podcast currently has 4 episodes available.

The Adversarial Testing Podcast episodes:

June 22, 2026 ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
A verbatim reading of ExploitGym, a large-scale benchmark of eight hundred and ninety-eight real-world vulnerabilities across userspace programs, Google's V8 JavaScript engine, and the Linux kernel, designed to measure whether AI agents can turn a proof-of-vulnerability into a working exploit that achieves unauthorized code execution. The authors find that frontier models can already exploit a non-trivial fraction of these targets, with the strongest configurations producing working exploits for over a hundred and fifty instances, and that standard mitigations reduce but do not eliminate agent success. The paper establishes exploitation as an under-evaluated, dual-use capability and argues that autonomous exploit development by AI agents is no longer hypothetical.
...more
41min
June 18, 2026 Predicting Model Behavior Before Release by Simulating Deployment
OpenAI describes Deployment Simulation, a method for previewing how a candidate model will behave in the real world before release by replaying recent, de-identified production conversations with the new model. This episode reads OpenAI's write-up in full and closes with a deeper, paper-based look at the technical methodology: the five-step resampling pipeline, how forecast error is decomposed, and the tool-simulator affordances that make agentic simulation realistic.
...more
26min
June 18, 2026 Viral Prompt Shows ChatGPT's Content Filters Don't Work
Mindgard red-team research shows that ChatGPT's image generator can be manipulated into producing violent and sexually explicit content that users never directly requested, by abusing a fun viral "restore the photo" prompt circulating on social media. This episode walks through how nondescript prompts slip past input and output filters, why prompt repetition makes things worse, and OpenAI's inadequate response. Content warning: discussion of death, sexual violence and murder. The exact jailbreak prompts are deliberately not included.
...more
11min
June 18, 2026 AI Act: EP approves simplification measures and "nudifier" app ban
The European Parliament has given final approval to changes to the EU AI Act under the digital omnibus package. The reforms postpone several high-risk and watermarking deadlines to give companies legal certainty, reduce overlapping rules for machinery products, and extend SME-style exemptions to small mid-cap enterprises. Alongside the simplification measures, Parliament secured an outright ban on "nudifier" apps and AI systems that generate child sexual abuse material or non-consensual intimate imagery.
...more
6min

FAQs about The Adversarial Testing Podcast:

How many episodes does The Adversarial Testing Podcast have?

The podcast currently has 4 episodes available.