This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).

This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).

This episode introduces and evaluates&nbsp;On-Policy Distillation (OPD)&nbsp;as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between&nbsp;on-policy training&nbsp;(sampling from the student model) and&nbsp;off-policy training&nbsp;(imitating external sources).

On-Policy Distillation: Efficient Post-Training for Language Models

Unpacking the questions shaping the next intelligence era.

I am producing a fully AI-generated podcast that explores the influence of AI within various industries and examines significant technological breakthroughs.

Technology

Unpacking the questions shaping the next intelligence era. I am producing a fully AI-generated podcast that explores the influence of AI within various industries and examines significant technological breakthroughs.

Share On-Policy Distillation: Efficient Post-Training for Language Models

Sign up to save your podcasts

On-Policy Distillation: Efficient Post-Training for Language Models

On-Policy Distillation: Efficient Post-Training for Language Models