
Sign up to save your podcasts
Or


In episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:
https://youtu.be/D0w44TGNsUs
So, what is GRPO?
GRPO stands for Group Relative Policy Optimization.
It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1.
Hosted on Acast. See acast.com/privacy for more information.
By Swetlana AIIn episode 114 we've been discussing DeepSeek's R1 model, which uses GRPO:
https://youtu.be/D0w44TGNsUs
So, what is GRPO?
GRPO stands for Group Relative Policy Optimization.
It is a reinforcement learning (RL) algorithm developed by the creators of the DeepSeek reasoning model R1. GRPO is designed to enhance the reasoning capabilities of AI models. It was first introduced in the DeepSeekMath paper and was also used in the post-training of DeepSeek-R1.
Hosted on Acast. See acast.com/privacy for more information.