In this episode:
• The Memory Bottleneck: Professor Norris and Linda introduce the paper 'Conditional Memory via Scalable Lookup' and debate the inefficiency of using expensive neural computation to simulate simple knowledge retrieval.
• Engram: N-grams Strike Back: Linda breaks down the 'Engram' module, explaining how it uses hashed N-grams and context-aware gating to inject static embeddings directly into the Transformer backbone.
• The U-Shaped Curve of Sparsity: The hosts discuss the 'Sparsity Allocation' problem, analyzing the trade-off between MoE experts and memory capacity, and the discovery that a hybrid approach yields superior results.
• Deepening the Network Without Layers: A discussion on mechanistic analysis, focusing on how Engram handles static patterns like named entities in early layers, freeing up the model's attention for complex reasoning.
• Prefetching the Future: Linda and Norris explore the system-level advantages of deterministic lookups, including offloading massive embedding tables to CPU memory, and conclude the episode.