January 29, 2025

DeepSeek: AI

32 minutes

DeepSeek-V3, a large-scale Mixture-of-Experts language model. Its design incorporates novel architectural features like Multi-Head Latent Attention and an auxiliary-loss-free load balancing strategy for efficient training using FP8 precision. The model was trained on a massive dataset (14.8 trillion tokens) at low cost, achieving state-of-the-art performance on various benchmarks, particularly in code and mathematics. Post-training techniques, including knowledge distillation, further enhanced its reasoning capabilities. Finally, the paper offers suggestions for improving future AI hardware designs.

...more

View all episodes

By Bholendra Singh

January 29, 2025

DeepSeek: AI

32 minutes

...more

Share DeepSeek: AI

Sign up to save your podcasts

DeepSeek: AI

DeepSeek: AI