Large Language Model (LLM) Talk

DeepSeek v3


Listen Later

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.

...more
View all episodesView all episodes
Download on the App Store

Large Language Model (LLM) TalkBy AI-Talk

  • 4
  • 4
  • 4
  • 4
  • 4

4

4 ratings


More shows like Large Language Model (LLM) Talk

View all
The Real Python Podcast by Real Python

The Real Python Podcast

140 Listeners