November 10, 2025

CARTRIDGE: Efficient In-Context Learning via Distillation

18 minutes

The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo introduces a novel method called **CARTRIDGE** for efficiently handling long text corpora in large language models, addressing the high memory cost associated with standard **In-Context Learning (ICL)** and its required **KV cache**. A CARTRIDGE is a smaller, trained KV cache representation of a corpus, which is created offline using a technique termed **SELF-STUDY**. This training process involves generating synthetic conversational data about the corpus and employing a **context-distillation objective** to ensure the CARTRIDGE maintains the generality and structural awareness of ICL while dramatically reducing memory consumption (up to 38.6x less) and increasing throughput. The research demonstrates that CARTRIDGES can match or exceed ICL performance, enable **context length extrapolation** beyond the model's native window, and even be **composed** together at inference time. The paper also includes detailed ablation studies on the SELF-STUDY components and theoretical analysis contrasting this gradient-descent approach with other memory methods like linear attention on synthetic memory tasks.

Source: June 13, 2025

https://arxiv.org/pdf/2506.06266

...more