In this episode:
• Introduction to the Compute Divide: Linda and Professor Norris introduce the podcast and discuss the massive computational barriers in modern LLM pretraining before introducing the HRM-Text paper.
• Biological Inspiration and the HRM Architecture: The hosts discuss how the human brain's frontoparietal loop inspired the dual-timescale Hierarchical Recurrent Model, breaking down the fast L-module and slow H-module.
• Stabilizing Recurrence with MagicNorm: Professor Norris questions the stability of recurrent networks, and Linda explains how MagicNorm and warmup deep credit assignment tame the vanishing and exploding gradients.
• Rethinking the Objective: PrefixLM and Task-Completion: Linda reveals that the model trains exclusively on instruction-response pairs, dropping raw text entirely, and explains the efficiency of the PrefixLM masking strategy.
• Results and the Democratization of AI: The hosts review the staggering benchmarks achieved on a $1,500 budget and discuss what this means for graduate students and independent researchers.