
Sign up to save your podcasts
Or


The researchers introduce Engram, a novel conditional memory module that enhances Large Language Models by integrating a scalable lookup mechanism for static knowledge. While modern models rely on Mixture-of-Experts (MoE) for sparse computation, Engram uses N-gram embeddings to retrieve formulaic or factual information in constant time. This architectural shift creates a U-shaped scaling law that balances neural processing with static memory, allowing the model to offload simple retrieval tasks to early layers. By delegating local patterns to these lookups, the transformer's attention capacity is preserved for complex reasoning and long-context processing. Experiments show that an Engram-augmented 27B model significantly outperforms standard MoE baselines in math, coding, and general reasoning. Furthermore, the system supports offloading massive parameter tables to host memory, ensuring high efficiency with minimal computational overhead.
By Enoch H. KangThe researchers introduce Engram, a novel conditional memory module that enhances Large Language Models by integrating a scalable lookup mechanism for static knowledge. While modern models rely on Mixture-of-Experts (MoE) for sparse computation, Engram uses N-gram embeddings to retrieve formulaic or factual information in constant time. This architectural shift creates a U-shaped scaling law that balances neural processing with static memory, allowing the model to offload simple retrieval tasks to early layers. By delegating local patterns to these lookups, the transformer's attention capacity is preserved for complex reasoning and long-context processing. Experiments show that an Engram-augmented 27B model significantly outperforms standard MoE baselines in math, coding, and general reasoning. Furthermore, the system supports offloading massive parameter tables to host memory, ensuring high efficiency with minimal computational overhead.