Best AI papers explained

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models


Listen Later

The researchers introduce Engram, a novel conditional memory module that enhances Large Language Models by integrating a scalable lookup mechanism for static knowledge. While modern models rely on Mixture-of-Experts (MoE) for sparse computation, Engram uses N-gram embeddings to retrieve formulaic or factual information in constant time. This architectural shift creates a U-shaped scaling law that balances neural processing with static memory, allowing the model to offload simple retrieval tasks to early layers. By delegating local patterns to these lookups, the transformer's attention capacity is preserved for complex reasoning and long-context processing. Experiments show that an Engram-augmented 27B model significantly outperforms standard MoE baselines in math, coding, and general reasoning. Furthermore, the system supports offloading massive parameter tables to host memory, ensuring high efficiency with minimal computational overhead.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang