Share DeepSeek-R1: Incentivizing Reasoning in LLMs

Copy link

August 08, 2025

DeepSeek-R1: Incentivizing Reasoning in LLMs

1 hour 9 minutes

This paper introduces DeepSeek-R1, a new suite of large language models developed by DeepSeek-AI, focusing on enhancing reasoning capabilities through reinforcement learning (RL). It details the development of DeepSeek-R1-Zero, a model trained purely with RL that demonstrates strong reasoning but has readability issues, and DeepSeek-R1, which addresses these flaws by incorporating multi-stage training with initial "cold-start" data and achieves performance comparable to OpenAI-o1-1217. The document also covers the distillation of reasoning abilities from larger DeepSeek-R1 models into smaller, more efficient models, making them available to the research community. Performance benchmarks on various tasks, including mathematics, coding, and general knowledge, are presented, highlighting the models' advancements. The paper concludes by discussing the effectiveness of distillation versus direct RL on smaller models and outlines future research directions.

Source: https://arxiv.org/pdf/2501.12948

...more

View all episodes

By mcgrof

August 08, 2025

DeepSeek-R1: Incentivizing Reasoning in LLMs

1 hour 9 minutes

Source: https://arxiv.org/pdf/2501.12948

...more

Sign up to save your podcasts