Learning GenAI via SOTA Papers

EP163: Why AI Models Only Remember Five Percent


Listen Later

The paper "Language Model Memory and Memory Models for Language" explores the capacity of machine learning models to store input information in hidden layer vector embeddings. The research identifies that standard causal language models typically produce "information-poor" embeddings because the objective of next-token prediction does not require the model to retain arbitrary input details. In contrast, autoencoders designed for input regeneration demonstrate nearly perfect memory formation.

To improve memory retention and computational efficiency, the author introduces a parallelizable encoder-decoder memory model architecture. Key contributions and findings include:

  • Training Paradigms: The paper proposes using combined objective functions—pairing next-token prediction with information-retention tasks like copying—to help models form information-rich memories.
  • Curriculum Learning: A streamlined training approach is introduced where a high-fidelity encoder is frozen, and decoders are trained first to process memories before learning next-token prediction.
  • Computational Efficiency: Substituting token sequences with memory embeddings reduces the time-to-first-token, minimizes KV cache sizes, and increases token throughput during inference.
  • Benchmark Performance: Models trained with these combined objectives show significant improvements in input information-related benchmarks without compromising general language understanding.

The findings also have implications for retrieval-based models, suggesting that current embedding models often lack the necessary information density to identify arbitrary details within text chunks.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu