Machine Learning Made Simple

Episode 60: DeepSeek Models Explained Part I


Listen Later

What if AI could match enterprise-grade performance at a fraction of the cost? In this episode, we dive deep into DeepSeek, the groundbreaking open-source models challenging tech giants with 95% lower costs. From innovative training optimizations to revolutionary data curation, discover how a resource-constrained startup is redefining what's possible in AI.

🎯 Episode Highlights:

  • Beyond cost-cutting: How DeepSeek matches top-tier AI performance

  • Game-changing memory optimization and pipeline parallelization

  • Inside the technology: Zero-redundancy training and dependency parsing

  • The future of efficient, accessible AI development

  • Whether you're an ML engineer or AI enthusiast, learn how clever optimization is democratizing advanced AI capabilities. No GPU farm needed!


    References for main topic:

    1. [2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    2. DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    3. [2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    4. [2412.19437] DeepSeek-V3 Technical Report

    5. https://arxiv.org/abs/2501.12948

    6. https://www.deepspeed.ai/2021/03/07/zero3-offload.html

    7. [1910.02054] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

    8. [2205.05198] Reducing Activation Recomputation in Large Transformer Models

    9. [2406.03488] Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training


    10. ...more
      View all episodesView all episodes
      Download on the App Store

      Machine Learning Made SimpleBy Saugata Chatterjee