May 25, 2025

Self-Evolving Curriculum for LLM Reasoning

14 minutes

This document presents Self-Evolving Curriculum (SEC), a novel method for reinforcement learning (RL) fine-tuning of large language models (LLMs) to enhance their reasoning capabilities. SEC frames curriculum selection as a non-stationary Multi-Armed Bandit (MAB) problem, where problem categories represent individual "arms". It learns a curriculum policy concurrently with LLM training, utilizing the absolute advantage from policy gradient methods as a metric for learning gain to dynamically adjust the problems presented. The paper demonstrates SEC's effectiveness across planning, inductive reasoning, and mathematics, showing improvements in generalization to harder problems and better skill balance in multi-domain training.

...more

View all episodes

By Enoch H. Kang

May 25, 2025

Self-Evolving Curriculum for LLM Reasoning

14 minutes

...more

Share Self-Evolving Curriculum for LLM Reasoning

Sign up to save your podcasts

Self-Evolving Curriculum for LLM Reasoning

Self-Evolving Curriculum for LLM Reasoning