Intelligence Unbound

On-Policy Distillation: Efficient Post-Training for Language Models


Listen Later

This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).

...more
View all episodesView all episodes
Download on the App Store

Intelligence UnboundBy Fourth Mind