The October 16, 2025 academic paper introduces Elastic-Cache, an innovative, training-free strategy designed to significantly accelerate the inference speed of diffusion large language models (DLMs) by optimizing Key-Value (KV) cache management. Standard DLMs suffer from slow decoding because they redundantly recompute the KV cache for all tokens at every step, despite minimal changes, especially in shallow layers; Elastic-Cache addresses this by introducing an adaptive, layer-aware refresh policy. This policy uses a lightweight attention-aware drift test on the most-attended token to determine *when* a refresh is necessary and employs a depth-aware schedule to decide *where* to recompute, focusing only on deeper, more volatile layers. Experiments demonstrate that this approach achieves substantial throughput speedups—up to 45.1× on longer sequences—with negligible loss in accuracy compared to baseline and fixed-period caching methods. The method also incorporates block-wise caching for distant MASK tokens to further reduce computational overhead. Source: https://arxiv.org/pdf/2510.14973