Large Language Model (LLM) Talk

DeepSeek v3


Listen Later

DeepSeek-V3 is a large Mixture-of-Experts (MoE) language model, trained ~10x less cost, with 671 billion total parameters, of which 37 billion are activated for each token. It uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures. A key feature of DeepSeek-V3 is its auxiliary-loss-free load balancing strategy and multi-token prediction training objective. The model was pre-trained on 14.8 trillion tokens and underwent supervised fine-tuning and reinforcement learning. It has demonstrated strong performance on various benchmarks, achieving results comparable to leading closed-source models while maintaining economical training costs.

...more
View all episodesView all episodes
Download on the App Store

Large Language Model (LLM) TalkBy AI-Talk

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like Large Language Model (LLM) Talk

View all
Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

272 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

441 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

331 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

217 Listeners

AI Today Podcast by AI & Data Today

AI Today Podcast

156 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,170 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

409 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

121 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

479 Listeners

Everyday AI Podcast – An AI and ChatGPT Podcast by Everyday AI

Everyday AI Podcast – An AI and ChatGPT Podcast

94 Listeners

AI + a16z by a16z

AI + a16z

31 Listeners

Training Data by Sequoia Capital

Training Data

43 Listeners