Machine Learning Made Simple

Ep 22: How small LLMs are outperforming GPT3 using a Mixture of Experts


Listen Later

Episode22: How small LLMs (47B) are outperforming GPT3 (185B) using a Mixture of Experts (MoE)

AI News:

  1. 2402.05120 More Agents Is All You Need

  2. 2403.16971 AIOS: LLM Agent Operating System

  3. 2404.02258 Mixture-of-Depths

  4. Devika GitHub Repository - Devika: An Agentic AI Software Engineer

  5. T-Rex GitHub Repository - T-Rex: A Large-Scale Relation Extraction Framework

  6. WSJ Article on Cognition Labs - A Peter Thiel-backed AI startup, Cognition Labs, seeks $2 billion in valuation

  7. References for main topic:

    1. 1701.06538 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts

    2. 2006.16668 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

    3. 2101.03961 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

    4. 2112.06905 GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

    5. 2202.08906 ST-MoE: Designing Stable and Transferable Sparse Expert Models

    6. 2211.15841 MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

    7. 2401.04088 Mixtral of Experts

    8. 1511.07543 Convergent Learning: Do different neural networks learn the same representations?


    9. ...more
      View all episodesView all episodes
      Download on the App Store

      Machine Learning Made SimpleBy Saugata Chatterjee