Best AI papers explained

Self-Evolving Curriculum for LLM Reasoning


Listen Later

This document presents Self-Evolving Curriculum (SEC), a novel method for reinforcement learning (RL) fine-tuning of large language models (LLMs) to enhance their reasoning capabilities. SEC frames curriculum selection as a non-stationary Multi-Armed Bandit (MAB) problem, where problem categories represent individual "arms". It learns a curriculum policy concurrently with LLM training, utilizing the absolute advantage from policy gradient methods as a metric for learning gain to dynamically adjust the problems presented. The paper demonstrates SEC's effectiveness across planning, inductive reasoning, and mathematics, showing improvements in generalization to harder problems and better skill balance in multi-domain training.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang