
Sign up to save your podcasts
Or


00:00:00 Start
00:17:41 DeepSeekMathBase+7B+的数学能力评估
00:27:47 DeepSeekMath-RL+的训练与评估
00:40:24 探索抗噪声奖励信号的算法
00:46:35 DPO、PPO+和+GRPO+的目标及梯度
00:51:30 Closing
By zc00:00:00 Start
00:17:41 DeepSeekMathBase+7B+的数学能力评估
00:27:47 DeepSeekMath-RL+的训练与评估
00:40:24 探索抗噪声奖励信号的算法
00:46:35 DPO、PPO+和+GRPO+的目标及梯度
00:51:30 Closing