Large Language Model (LLM) Talk

DeepSeek-V2


Listen Later

DeepSeek-V2 is a Mixture-of-Experts (MoE) language model that balances strong performance with economical training and efficient inference. It uses a total of 236B parameters, with 21B activated for each token, and supports a context length of 128K tokens. Key architectural innovations includeMulti-Head Latent Attention (MLA), which compresses the KV cache for faster inference, andDeepSeekMoE, which enables economical training through sparse computation. Compared to DeepSeek 67B, DeepSeek-V2 saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts maximum generation throughput by 5.76 times. It is pre-trained on 8.1T tokens of high-quality data and further aligned through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL).

...more
View all episodesView all episodes
Download on the App Store

Large Language Model (LLM) TalkBy AI-Talk

  • 4
  • 4
  • 4
  • 4
  • 4

4

4 ratings


More shows like Large Language Model (LLM) Talk

View all
Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

303 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

341 Listeners

The Daily by The New York Times

The Daily

112,601 Listeners

Learning English from the News by BBC Radio

Learning English from the News

266 Listeners

Thinking in English by Thomas Wilkinson

Thinking in English

109 Listeners

AI Agents: Top Trend of 2025 - by AIAgentStore.ai by AIAgentStore.ai

AI Agents: Top Trend of 2025 - by AIAgentStore.ai

3 Listeners