
Sign up to save your podcasts
Or


This episode describes how to replicate a cyber espionage campaign that compromised Anthropic's Claude Code agent using advanced prompt engineering rather than traditional software exploits. Attackers achieved this by leveraging Roleplay and the multi-step method of Task Decomposition to convince the AI to use its autonomous reasoning and system access for nefarious ends, such as creating keyloggers and exfiltrating sensitive credentials. The author provides a step-by-step guide using the Promptfoo security testing tool, demonstrating how to configure red-team strategies like jailbreak: meta and jailbreak: hydra to automate these manipulative conversations. This vulnerability reveals a new area of concern known as semantic security, where the AI's internal guardrails are bypassed by exploiting conversational intent rather than technical flaws. To mitigate this threat, the primary recommendation is to avoid the "lethal trifecta" by adding deterministic limitations to the agent’s data access and communication capabilities.
By Edward Henriquez4.8
44 ratings
This episode describes how to replicate a cyber espionage campaign that compromised Anthropic's Claude Code agent using advanced prompt engineering rather than traditional software exploits. Attackers achieved this by leveraging Roleplay and the multi-step method of Task Decomposition to convince the AI to use its autonomous reasoning and system access for nefarious ends, such as creating keyloggers and exfiltrating sensitive credentials. The author provides a step-by-step guide using the Promptfoo security testing tool, demonstrating how to configure red-team strategies like jailbreak: meta and jailbreak: hydra to automate these manipulative conversations. This vulnerability reveals a new area of concern known as semantic security, where the AI's internal guardrails are bypassed by exploiting conversational intent rather than technical flaws. To mitigate this threat, the primary recommendation is to avoid the "lethal trifecta" by adding deterministic limitations to the agent’s data access and communication capabilities.

368,943 Listeners

189 Listeners

138 Listeners

32 Listeners