In this episode, we sit down with Solution Architect Robert Alvarez to discuss the technology behind Pure Key-Value Accelerator (KVA) and its role in accelerating AI inference. Pure KVA is a protocol-agnostic, key-value caching solution that, when combined with FlashBlade data storage, dramatically improves GPU efficiency and consistency in AI environments. Robert—whose background includes time as a Santa Clara University professor, NASA Solution Architect, and work at CERN—explains how this innovation is essential for serving an entire fleet of AI workloads, including modern agentic or chatbot interfaces.
Robert dives into the massive growth of the AI Inference market, driven by the need for near real-time processing and low-latency AI applications. This trend makes the need for a solution like Pure KVA critical. He details how KVA removes the bottleneck of GPU memory and shares compelling benchmark results: up to twenty times faster inference with NFS and six times faster with S3, all over standard Ethernet. These performance gains are key to helping enterprises scale more efficiently and reduce overall GPU costs.
Beyond the technical deep dive, the episode explores the origin of the KVA idea, the unique Pure IP that enables it, and future integrations like Dynamo and the partnership with Comet for LLM observability. In the popular “Hot Takes” segment, Robert offers his perspective on blind spots IT leaders might have in managing AI data and shares advice for his younger self on the future of the data management space.
To learn more about Pure KVA, visit purestorage.com/launch.
Check out the new Pure Storage digital customer community to join the conversation with peers and Pure experts:
https://purecommunity.purestorage.com/
00:00 Intro and Welcome
02:21 Background on Our Guest
06:57 Stat of the Episode on AI Inferencing Spend
09:10 Why AI Inference is Difficult at Scale
11:00 How KV Cache Acceleration Works
14:50 Key Partnerships Using KVA
20:28 Hot Takes Segment