The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo introduces a novel method called CARTRIDGE for efficiently handling long text corpora in large language models, addressing the high memory cost associated with standard In-Context Learning (ICL) and its required KV cache. A CARTRIDGE is a smaller, trained KV cache representation of a corpus, which is created offline using a technique termed SELF-STUDY. This training process involves generating synthetic conversational data about the corpus and employing a context-distillation objective to ensure the CARTRIDGE maintains the generality and structural awareness of ICL while dramatically reducing memory consumption (up to 38.6x less) and increasing throughput. The research demonstrates that CARTRIDGES can match or exceed ICL performance, enable context length extrapolation beyond the model's native window, and even be composed together at inference time. The paper also includes detailed ablation studies on the SELF-STUDY components and theoretical analysis contrasting this gradient-descent approach with other memory methods like linear attention on synthetic memory tasks. Source: June 13, 2025 https://arxiv.org/pdf/2506.06266