January 28, 2025

Episode 60: DeepSeek Models Explained Part I

36 minutes

What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI.

🎯 Episode Highlights:

Beyond cost-cutting: How DeepSeek matches top-tier AI performance

Game-changing memory optimization and pipeline parallelization

Inside the technology: Zero-redundancy training and dependency parsing

The future of efficient, accessible AI development

Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!

References for main topic:

[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[2412.19437] DeepSeek-V3 Technical Report

https://arxiv.org/abs/2501.12948

https://www.deepspeed.ai/2021/03/07/zero3-offload.html

[1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

[2205.05198] Reducing Activation Recomputation in Large Transformer Models

[2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

...more

View all episodes

By Saugata Chatterjee

January 28, 2025

Episode 60: DeepSeek Models Explained Part I

36 minutes

🎯 Episode Highlights:

Beyond cost-cutting: How DeepSeek matches top-tier AI performance

Game-changing memory optimization and pipeline parallelization

Inside the technology: Zero-redundancy training and dependency parsing

The future of efficient, accessible AI development

Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!

References for main topic:

[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

[2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[2412.19437] DeepSeek-V3 Technical Report

https://arxiv.org/abs/2501.12948

https://www.deepspeed.ai/2021/03/07/zero3-offload.html

[1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

[2205.05198] Reducing Activation Recomputation in Large Transformer Models

[2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

...more

Share Episode 60: DeepSeek Models Explained Part I

Sign up to save your podcasts

Episode 60: DeepSeek Models Explained Part I

Episode 60: DeepSeek Models Explained Part I