Large Language Model (LLM) Talk

DeepSeek-R1


Listen Later

DeepSeek-R1 is a language model focused on enhanced reasoning, employing reinforcement learning (RL) and building upon the DeepSeek-V3-Base model. It uses Group Relative Policy Optimization (GRPO) to reduce computational costs by eliminating the need for a separate critic model, which is commonly used in other algorithms such as PPO. The model uses a multi-stage training pipeline including an initial fine-tuning with cold-start data, followed by reasoning-oriented RL, and supervised fine-tuning (SFT) using rejection sampling, and a final RL stage. A rule-based reward system avoids reward hacking. DeepSeek-R1 also employs a language consistency reward during RL to address language mixing. The model's reasoning capabilities are then distilled into smaller models. DeepSeek-R1 achieves performance comparable to, and sometimes surpassing, OpenAI's o1 series on various reasoning, math, and coding tasks.

...more
View all episodesView all episodes
Download on the App Store

Large Language Model (LLM) TalkBy AI-Talk

  • 4
  • 4
  • 4
  • 4
  • 4

4

4 ratings


More shows like Large Language Model (LLM) Talk

View all
Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

303 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

341 Listeners

The Daily by The New York Times

The Daily

112,584 Listeners

Learning English from the News by BBC Radio

Learning English from the News

264 Listeners

Thinking in English by Thomas Wilkinson

Thinking in English

110 Listeners

AI Agents: Top Trend of 2025 - by AIAgentStore.ai by AIAgentStore.ai

AI Agents: Top Trend of 2025 - by AIAgentStore.ai

3 Listeners