AI Post Transformers

NeurIPS 2025: Homogeneous Keys, Heterogeneous Values


Listen Later

This research presents a novel method for efficient long-context modeling in Large Language Models (LLMs) by tackling the quadratic complexity of attention mechanisms through KV cache compression. The core discovery is a fundamental local KV cache asymmetry, which reveals that adjacent attention keys exhibit high structural homogeneity, while their associated value vectors possess distinct, heterogeneous distributions. To capitalize on this finding, the authors propose AsymKV, a training-free compression framework that shifts information loss from heterogeneous values to homogeneous keys. AsymKV operates by applying homogeneity-based merging to keys using a mathematically derived optimal vector, paired with a lossless value representation scheme utilizing cardinality-aware normalization to preserve vital information. Extensive empirical results on benchmarks like LongBench, across diverse models such as LLaMA3.1-8B, confirm that AsymKV consistently surpasses state-of-the-art long-context methods in terms of accuracy and information retention, offering improved performance with practical inference efficiency. Source: https://arxiv.org/pdf/2506.05410
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof