May 25, 2025

Post-Training Methods for Large Language Models

30 minutes

https://arxiv.org/html/2502.21321v2

The sources explore the application of reinforcement learning (RL) to refine Large Language Models (LLMs), treating text generation as a sequence of decisions. They discuss how RL methods, particularly those using reward models trained on human preferences, enable LLMs to produce outputs that are not just statistically likely but also aligned with desired characteristics like accuracy and helpfulness. Various optimization techniques, from classical policy gradients like REINFORCE and PPO to newer preference-based approaches like DPO and GRPO, are examined for their role in maximizing this learned reward. Additionally, the text touches upon test-time strategies such as Tree-of-Thoughts and Graph-of-Thoughts that enhance multi-step reasoning by exploring different thought processes during inference.

...more

View all episodes

By Steven

May 25, 2025

Post-Training Methods for Large Language Models

30 minutes

https://arxiv.org/html/2502.21321v2

...more

Share Post-Training Methods for Large Language Models

Sign up to save your podcasts

Post-Training Methods for Large Language Models

Post-Training Methods for Large Language Models