September 10, 2025

GRPO | Group Relative Policy Optimization

14 minutes

In episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:

https://youtu.be/D0w44TGNsUs

So, what is GRPO?

GRPO stands for Group Relative Policy Optimization.

It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1.

Hosted on Acast. See acast.com/privacy for more information.

...more