Papers Read on AI

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts


Listen Later

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based LLMs, including recent state-of-the-art open-source models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable, Transformer-like performance. Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer.

2024: Maciej Pi'oro, Kamil Ciebiera, Krystian Kr'ol, Jan Ludziejewski, Sebastian Jaszczur



https://arxiv.org/pdf/2401.04081.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings


More shows like Papers Read on AI

View all
MLOps.community by Demetrios

MLOps.community

23 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners