This research presents a novel method for efficient long-context modeling in Large Language Models (LLMs) by tackling the quadratic complexity of attention mechanisms through KV cache compression. The core discovery is a fundamental local KV cache asymmetry, which reveals that adjacent attention keys exhibit high structural homogeneity, while their associated value vectors possess distinct, heterogeneous distributions. To capitalize on this finding, the authors propose AsymKV, a training-free compression framework that shifts information loss from heterogeneous values to homogeneous keys. AsymKV operates by applying homogeneity-based merging to keys using a mathematically derived optimal vector, paired with a lossless value representation scheme utilizing cardinality-aware normalization to preserve vital information. Extensive empirical results on benchmarks like LongBench, across diverse models such as LLaMA3.1-8B, confirm that AsymKV consistently surpasses state-of-the-art long-context methods in terms of accuracy and information retention, offering improved performance with practical inference efficiency. Source: https://arxiv.org/pdf/2506.05410