Swetlana AI Podcast

GRPO | Group Relative Policy Optimization


Listen Later

In episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:


https://youtu.be/D0w44TGNsUs


So, what is GRPO?


GRPO stands for Group Relative Policy Optimization.


It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1.

Hosted on Acast. See acast.com/privacy for more information.

...more
View all episodesView all episodes
Download on the App Store

Swetlana AI PodcastBy Swetlana AI