January 20, 2025

DeepSeek v3

Listen Later

16 minutes

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

Large Language Model (LLM) Talk

By AI-Talk

4

44 ratings

January 20, 2025

DeepSeek v3

Listen Later

16 minutes

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.

...more

More shows like Large Language Model (LLM) Talk

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

303 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

341 Listeners

The Daily by The New York Times

The Daily

112,539 Listeners

Learning English from the News by BBC Radio

Learning English from the News

266 Listeners

Thinking in English by Thomas Wilkinson

Thinking in English

111 Listeners

AI Agents: Top Trend of 2025 - by AIAgentStore.ai by AIAgentStore.ai

AI Agents: Top Trend of 2025 - by AIAgentStore.ai

3 Listeners