Programmers Quickie

🤖 DeepSeek-V3: A 671B Parameter Mixture-of-Experts Language Model


Listen Later

A 671B parameter Mixture-of-Experts language model. It highlights the model's architecture, including its innovative load balancing and multi-token prediction strategies, and its efficient training process using FP8 precision. Benchmark results demonstrate DeepSeek-V3's strong performance compared to other open-source and some closed-source models, particularly in math and code tasks. The document also provides instructions for running DeepSeek-V3 locally using various frameworks and hardware, including NVIDIA and AMD GPUs and Huawei Ascend NPUs. Finally, licensing and contact information are included.

...more
View all episodesView all episodes
Download on the App Store

Programmers QuickieBy Software Engineering

  • 3.1
  • 3.1
  • 3.1
  • 3.1
  • 3.1

3.1

7 ratings


More shows like Programmers Quickie

View all
Two Voice Devs by Mark and Allen

Two Voice Devs

1 Listeners

GOTO - The Brightest Minds in Tech by GOTO

GOTO - The Brightest Minds in Tech

5 Listeners

Spring Office Hours by Dan Vega & DaShaun Carter

Spring Office Hours

4 Listeners