In this episode:
• Welcome and the Mamba Lineage: Professor Norris and Linda introduce Mamba-3, discussing the shift towards inference-time efficiency and the need for sub-quadratic models.
• Exponential-Trapezoidal Discretization: Linda explains how Mamba-3 upgrades to a second-order trapezoidal rule, creating an implicit convolution that removes the need for explicit causal convolution layers.
• Complex-Valued States and the RoPE Trick: The hosts discuss the limitations of real-valued SSMs in state tracking and how Mamba-3 uses a data-dependent RoPE trick to efficiently implement complex-valued rotational dynamics.
• MIMO and Hardware Arithmetic Intensity: Linda details how Mamba-3 uses a Multi-Input, Multi-Output formulation to increase arithmetic intensity, overlaying free compute on top of memory bottlenecks during decoding.
• Performance Results and Wrap Up: Professor Norris is convinced by the empirical results, noting how Mamba-3 advances the Pareto frontier by matching Mamba-2's performance with half the latency.