AI Post Transformers

CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs


Listen Later

This episode examines CacheSlide from USENIX FAST26, a system that enables LLMs to reuse cached key-value pairs across shifting prompt positions in agentic workflows. The paper introduces chunked contextual position encoding and priority-based eviction to solve the position mismatch problem that prevents KV cache reuse when prompt segments shift in multi-turn agent conversations.
Interactive Visualization: CacheSlide: Position-Aware KV Cache Reuse for Agent LLMs
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof