DeepSeek-V3, a 671B-parameter Mixture-of-Experts large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described.
 
paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

DeepSeek-V3, a 671B-parameter Mixture-of-Experts large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described. paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

DeepSeek-V3, a 671B-parameter Mixture-of-Experts large language model. It covers the model's architecture, including Multi-Head Latent Attention and an innovative auxiliary-loss-free load balancing strategy for DeepSeekMoE. The training process, encompassing pre-training on 14.8 trillion tokens and post-training using supervised fine-tuning and reinforcement learning, is described.
 
paper: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

DeepSeek V3

This show provides an overview of AI papers. The overview is generated using Google Illuminate and NotebookLM. Taking full advantage of the technology era we are living in. Making listening to audio discussions of your favorite papers easy and on the go.

Share DeepSeek V3

Sign up to save your podcasts

DeepSeek V3

DeepSeek V3