The Gist Talk

Offloading LLM Attention: Q-Shipping and KV-Side Compute


Listen Later

The source provides an extensive overview of strategies, collectively termed Q-shipping and KV-side compute, aimed at overcoming the memory bandwidth bottleneck during Large Language Model (LLM) inference, particularly in the decode phase

...more
View all episodesView all episodes
Download on the App Store

The Gist TalkBy kw