Share EP138: [Mamba-2] Transformers and SSMs Are the Same Engine

Copy link

March 31, 2026

EP138: [Mamba-2] Transformers and SSMs Are the Same Engine

23 minutes

This paper establishes a theoretical connection between State-Space Models (SSMs) and attention mechanisms through a framework called Structured State Space Duality (SSD). By utilizing the properties of semiseparable matrices, the authors reveal that these two model families are closely related, allowing for a unified understanding of their linear (recurrent) and quadratic (attention-like) forms.

The primary contribution is the development of the Mamba-2 architecture, which refines the selective SSM layer to be 2–8× faster than the original Mamba while supporting significantly larger recurrent state sizes. Mamba-2 is designed for high hardware efficiency, leveraging matrix multiplication units and enabling standard systems optimizations like Tensor Parallelism, which were previously difficult to implement for SSMs.

Empirically, the sources state that Mamba-2 Pareto dominates both the original Mamba and strong Transformer baselines in terms of perplexity and wall-clock time. It performs exceptionally well on language modeling tasks and challenging associative recall tests, effectively scaling to handle longer sequences and higher information capacity.

...more

View all episodes

By Yun Wu

March 31, 2026

EP138: [Mamba-2] Transformers and SSMs Are the Same Engine

23 minutes

...more

Sign up to save your podcasts