Build Wiz AI Show

Catching AI Sleeper Agent - LLM Backdoors


Listen Later

Could your trusted AI model be a hidden "sleeper agent" just waiting for a secret command to turn malicious? We explore a new methodology that extracts and reconstructs backdoor triggers by exploiting the surprising fact that these models often strongly memorize their own poisoning data. Tune in to discover how this inference-only scanner can unmask hidden threats across various LLMs without needing any prior knowledge of the attacker’s specific trigger or target behavior.

Source: https://arxiv.org/pdf/2602.03085

...more
View all episodesView all episodes
Download on the App Store

Build Wiz AI ShowBy Build Wiz AI