
Sign up to save your podcasts
Or


State space models like Mamba promised linear scaling and constant memory. They delivered on efficiency, but researchers kept hitting the same wall: ask Mamba to recall something specific from early in a long context, and performance drops.
Three papers at ICLR 2026 independently attacked this limitation. That convergence tells you how fundamental the problem is.
This podcast breaks down:
- Why Mamba's fixed-size state causes "lossy compression" of context
- How Mixture of Memories (MoM) adds multiple internal memory banks
- How Log-Linear Attention finds a middle ground between SSM and full attention
- Why one paper proves SSMs fundamentally can't solve certain tasks without external tools
The pattern across all three: you can add more state, but you have to pay somewhere. Parameters, mechanism complexity, or system infrastructure. No free lunch.
📄 Papers covered:
- MoM: Linear Sequence Modeling with Mixture-of-Memories
https://arxiv.org/abs/2502.13685
- Log-Linear Attention
https://openreview.net/forum?id=mOJgZWkXKW
- To Infinity and Beyond: Tool-Use Unlocks Length Generalization in SSMs
https://openreview.net/forum?id=sSfep4udCb
📬 Newsletter: https://llmsresearch.substack.com
🐦 Twitter/X: https://x.com/llmsresearch
💻 GitHub: https://github.com/llmsresearch
#Mamba #SSM #StateSpaceModels #ICLR2026 #LLM #MachineLearning #AIResearch #Transformers #DeepLearningChapters timestamp0:00 Mamba's secret weakness
0:42 The promise: linear scaling, constant memory
1:14 The catch: forgetting specific details
1:34 Memory bottleneck explained
1:43 Attention = perfect recall filing cabinet
2:10 SSM = single notepad with fixed pages
2:49 The core tradeoff
2:57 Three solutions to fix it
3:00 Solution 1: Mixture of Memories (MoM)
3:51 Solution 2: Log-Linear Attention
4:48 Solution 3: External tool use
5:49 The "no free lunch" pattern
6:41 What wins for longer contexts?
7:04 Subscribe for more research deep dives
By LLMs ResearchState space models like Mamba promised linear scaling and constant memory. They delivered on efficiency, but researchers kept hitting the same wall: ask Mamba to recall something specific from early in a long context, and performance drops.
Three papers at ICLR 2026 independently attacked this limitation. That convergence tells you how fundamental the problem is.
This podcast breaks down:
- Why Mamba's fixed-size state causes "lossy compression" of context
- How Mixture of Memories (MoM) adds multiple internal memory banks
- How Log-Linear Attention finds a middle ground between SSM and full attention
- Why one paper proves SSMs fundamentally can't solve certain tasks without external tools
The pattern across all three: you can add more state, but you have to pay somewhere. Parameters, mechanism complexity, or system infrastructure. No free lunch.
📄 Papers covered:
- MoM: Linear Sequence Modeling with Mixture-of-Memories
https://arxiv.org/abs/2502.13685
- Log-Linear Attention
https://openreview.net/forum?id=mOJgZWkXKW
- To Infinity and Beyond: Tool-Use Unlocks Length Generalization in SSMs
https://openreview.net/forum?id=sSfep4udCb
📬 Newsletter: https://llmsresearch.substack.com
🐦 Twitter/X: https://x.com/llmsresearch
💻 GitHub: https://github.com/llmsresearch
#Mamba #SSM #StateSpaceModels #ICLR2026 #LLM #MachineLearning #AIResearch #Transformers #DeepLearningChapters timestamp0:00 Mamba's secret weakness
0:42 The promise: linear scaling, constant memory
1:14 The catch: forgetting specific details
1:34 Memory bottleneck explained
1:43 Attention = perfect recall filing cabinet
2:10 SSM = single notepad with fixed pages
2:49 The core tradeoff
2:57 Three solutions to fix it
3:00 Solution 1: Mixture of Memories (MoM)
3:51 Solution 2: Log-Linear Attention
4:48 Solution 3: External tool use
5:49 The "no free lunch" pattern
6:41 What wins for longer contexts?
7:04 Subscribe for more research deep dives