Mechanical Dreams

M2RNN


Listen Later

In this episode:
• Welcome & Introduction: Professor Norris and Linda welcome the listeners. Linda introduces the paper of the week, teasing the unexpected comeback of non-linear RNNs.
• The Expressivity Gap: Linear vs. Non-Linear RNNs: The hosts discuss how linear RNNs like Mamba and Gated DeltaNet dominate due to their efficiency, but fundamentally lack the expressive power for complex state-tracking tasks compared to classic non-linear RNNs.
• The Real Bottleneck: State Capacity: Linda explains a key insight from the paper: traditional non-linear RNNs failed at language modeling and in-context retrieval not because of their non-linearity, but because they relied on small, vector-valued hidden states.
• Enter M²RNN: Matrix-Valued States: A deep dive into the Matrix-to-Matrix RNN architecture, focusing on how outer product state expansion and an independent forget gate allow it to achieve massive state capacities.
• Hardware Utilization & Systems Engineering: Professor Norris questions the computational cost. Linda explains the ingenious tiling tricks that maximize Tensor Core utilization without padding waste, plus a look at their Tensor Parallelism strategies.
• Empirical Wins & The Power of Hybrids: Reviewing the benchmark results across state tracking, long-context retrieval, and language modeling, highlighting how swapping just a single layer in a hybrid architecture to M²RNN yields massive performance jumps.
• Conclusion & Wrap-Up: Professor Norris admits he is convinced by the hybrid approach. The hosts summarize the main takeaways and sign off until the next episode.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk