Share GRPO aka DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Copy link

July 03, 2025

GRPO aka DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

17 minutes

Arxiv link: https://arxiv.org/abs/2402.03300

In this episode of "The AI Research Deep Dive," the host breaks down "DeepSeekMath," a groundbreaking paper that challenges the dominance of massive, proprietary AI models in mathematical reasoning. The discussion centers on how a relatively small, 7-billion-parameter open-source model managed to outperform Google's 540B-parameter Minerva, a model 77 times its size. Listeners will learn about the meticulous three-stage recipe for this success: an innovative process for building a colossal, high-quality math dataset from the web; the strategic choice to pre-train on code to foster logical reasoning; and the development of a novel and highly efficient reinforcement learning algorithm called Group Relative Policy Optimization (GRPO). The episode highlights how this work provides a powerful blueprint for the open-source community, proving that smart data curation and algorithmic innovation can be more impactful than sheer model size alone.

...more

View all episodes

By The AI Research Deep Dive

July 03, 2025

GRPO aka DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

17 minutes

Arxiv link: https://arxiv.org/abs/2402.03300

...more

Sign up to save your podcasts