In this episode:
• Introduction to State-Tracking: Linda and Professor Norris introduce the paper and discuss the historical context of state-tracking in sequence models.
• The Next-Token Prediction Testbed: The hosts discuss how the authors used Python REPL traces with print statements to evaluate models using next-token prediction instead of sequence-to-sequence.
• DeltaNet Triumphs Over Transformers: Linda explains how DeltaNet with extended eigenvalues perfectly extrapolated the tracking task, while Transformers failed even with dense supervision.
• The Catch: Partial Observability: Professor Norris questions the limits, leading Linda to introduce Probabilistic Finite-State Automata with State Reveals (PFSA-SR) and unobservable branching.
• The Math of Norm Decay: A deep dive into why linear RNNs suffer exponential norm decay without non-linear renormalization, finalizing the episode's takeaways.