llm learning road

DeepseekMath


Listen Later

00:00:00 Start

00:17:41 DeepSeekMathBase+7B+的数学能力评估

00:27:47 DeepSeekMath-RL+的训练与评估

00:40:24 探索抗噪声奖励信号的算法

00:46:35 DPO、PPO+和+GRPO+的目标及梯度

00:51:30 Closing

...more
View all episodesView all episodes
Download on the App Store

llm learning roadBy zc