April 24, 2026

EP162: AI agents beat humans with malicious skills

21 minutes

This paper provides a comprehensive survey of the agent skills paradigm, a modular approach that allows large language models (LLMs) to acquire specialized procedural expertise on demand without retraining. Instead of encoding all knowledge in model weights, this architecture uses composable packages of instructions, code, and resources—often formalized through the SKILL.md specification—to enable dynamic capability extension.

Key areas covered in the survey include:

Architectural Foundations: The paper highlights a progressive disclosure architecture that loads information in three stages (metadata, instructions, and resources) to minimize context window consumption. It also defines the "agentic stack," where skills provide the procedural "what to do" while the Model Context Protocol (MCP) provides the connectivity for "how to connect".
Skill Acquisition: The authors categorize four primary modalities for obtaining skills: human-authoring, reinforcement learning with skill libraries (e.g., SAGE), autonomous exploration (e.g., SEAgent), and compositional synthesis.
Deployment and Benchmarks: The primary domain for these skills is the computer-use agent (CUA) stack, where agents navigate GUIs. The paper notes significant progress on benchmarks like OSWorld, where success rates have recently surpassed human baselines.
Security Risks: Empirical analysis revealed that 26.1% of community-contributed skills contain vulnerabilities, such as prompt injection and data exfiltration.
Proposed Governance: To address these risks, the authors propose a Skill Trust and Lifecycle Governance Framework. This model uses four sequential verification gates—static analysis, semantic classification, behavioral sandboxing, and permission validation—to assign skills to graduated trust tiers.

The paper concludes by identifying seven open challenges, including cross-platform portability and skill selection at scale, providing a research agenda for developing trustworthy, self-improving skill ecosystems.

...more

View all episodes

By Yun Wu

April 24, 2026

EP162: AI agents beat humans with malicious skills

21 minutes

Key areas covered in the survey include:

Architectural Foundations: The paper highlights a progressive disclosure architecture that loads information in three stages (metadata, instructions, and resources) to minimize context window consumption. It also defines the "agentic stack," where skills provide the procedural "what to do" while the Model Context Protocol (MCP) provides the connectivity for "how to connect".
Skill Acquisition: The authors categorize four primary modalities for obtaining skills: human-authoring, reinforcement learning with skill libraries (e.g., SAGE), autonomous exploration (e.g., SEAgent), and compositional synthesis.
Deployment and Benchmarks: The primary domain for these skills is the computer-use agent (CUA) stack, where agents navigate GUIs. The paper notes significant progress on benchmarks like OSWorld, where success rates have recently surpassed human baselines.
Security Risks: Empirical analysis revealed that 26.1% of community-contributed skills contain vulnerabilities, such as prompt injection and data exfiltration.
Proposed Governance: To address these risks, the authors propose a Skill Trust and Lifecycle Governance Framework. This model uses four sequential verification gates—static analysis, semantic classification, behavioral sandboxing, and permission validation—to assign skills to graduated trust tiers.

...more

Share EP162: AI agents beat humans with malicious skills

Sign up to save your podcasts

EP162: AI agents beat humans with malicious skills

EP162: AI agents beat humans with malicious skills