Learning GenAI via SOTA Papers

EP162: AI agents beat humans with malicious skills


Listen Later

This paper provides a comprehensive survey of the agent skills paradigm, a modular approach that allows large language models (LLMs) to acquire specialized procedural expertise on demand without retraining. Instead of encoding all knowledge in model weights, this architecture uses composable packages of instructions, code, and resources—often formalized through the SKILL.md specification—to enable dynamic capability extension.

Key areas covered in the survey include:

  • Architectural Foundations: The paper highlights a progressive disclosure architecture that loads information in three stages (metadata, instructions, and resources) to minimize context window consumption. It also defines the "agentic stack," where skills provide the procedural "what to do" while the Model Context Protocol (MCP) provides the connectivity for "how to connect".
  • Skill Acquisition: The authors categorize four primary modalities for obtaining skills: human-authoring, reinforcement learning with skill libraries (e.g., SAGE), autonomous exploration (e.g., SEAgent), and compositional synthesis.
  • Deployment and Benchmarks: The primary domain for these skills is the computer-use agent (CUA) stack, where agents navigate GUIs. The paper notes significant progress on benchmarks like OSWorld, where success rates have recently surpassed human baselines.
  • Security Risks: Empirical analysis revealed that 26.1% of community-contributed skills contain vulnerabilities, such as prompt injection and data exfiltration.
  • Proposed Governance: To address these risks, the authors propose a Skill Trust and Lifecycle Governance Framework. This model uses four sequential verification gates—static analysis, semantic classification, behavioral sandboxing, and permission validation—to assign skills to graduated trust tiers.

The paper concludes by identifying seven open challenges, including cross-platform portability and skill selection at scale, providing a research agenda for developing trustworthy, self-improving skill ecosystems.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu