Daily Tech Feed: From the Labs

SkillRL: Don't Give Agents Memories, Give Them Skills


Listen Later

Episode 008: SkillRL — Teaching AI Agents to Learn Skills, Not Memories

Why it matters. LLM agents that learn from experience typically store raw trajectories — verbose logs of every action taken. This is like studying for an exam by memorizing every page of the textbook instead of extracting key concepts. "SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning" introduces a framework that distills experience into compact, reusable skills and evolves them through reinforcement learning. Using Qwen 2.5 at just 7 billion parameters, SkillRL achieves 89.9% on ALFWorld — beating GPT-4o by 41.9 points and Gemini 2.5 Pro by 29.6 points — while compressing knowledge 10–20× compared to raw trajectory storage.

UNC Chapel Hill / AIMING Lab. SkillRL comes from the AIMING Lab at the University of North Carolina at Chapel Hill, with contributions from NEC Labs America. The paper is available on arXiv (2602.08234) with an HTML version. Code is open-sourced on GitHub. The framework uses Qwen 2.5-7B-Instruct as its base model and OpenAI's o3 as the teacher model for skill distillation. It builds on the GRPO reinforcement learning algorithm and is evaluated on ALFWorld, WebShop, and seven search-augmented QA tasks.

The Researchers. Huaxiu Yao, the senior author, is an Assistant Professor of Computer Science at UNC Chapel Hill and director of the AIMING Lab, working on adaptive intelligent agents, foundation models, and AI alignment. First author Peng Xia is a PhD student at UNC Chapel Hill focused on multimodal AI and agent systems. Cihang Xie is an Assistant Professor at UC Santa Cruz specializing in computer vision and machine learning. Zeyu Zheng contributes reinforcement learning expertise, while Xujiang Zhao and Haifeng Chen bring applied AI research experience from NEC Labs America.

Key Technical Concepts. SkillRL introduces three innovations: (1) experience-based distillation that extracts strategic patterns from successes and counterfactual lessons from failures into a hierarchical skill library called SkillBank; (2) adaptive retrieval that selects both general and task-specific skills at inference time; and (3) recursive evolution where the skill library co-evolves with the agent's policy during RL training. Prior approaches like Reflexion, ExpeL, and MemRL either store raw trajectories or do simple self-reflection — SkillRL's hierarchical abstraction is what enables the dramatic performance leap.

Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.

...more
View all episodesView all episodes
Download on the App Store

Daily Tech Feed: From the LabsBy Daily Tech Feed