April 29, 2025

【第211期】大型语言模型API中的提示缓存机制研究

18 minutes

Seventy3：借助NotebookLM的能力进行论文解读，专注人工智能、大模型、机器人算法方向，让大家跟着AI一起进步。

进群添加小助手微信：seventy3_podcast

备注：小宇宙

今天的主题是：Auditing Prompt Caching in Language Model APIs

Summary

The provided research paper investigates prompt caching in large language model APIs, revealing that this optimization can lead to data-dependent timing variations exploitable for side-channel attacks. Through statistical audits on various real-world APIs, the authors detected global cache sharing in several providers, including OpenAI, which poses potential privacy risks by allowing attackers to infer information about other users' prompts. Furthermore, the study demonstrates how timing differences can leak details about the underlying model architecture, evidenced by their finding that OpenAI's embedding model is likely a decoder-only Transformer. Finally, the paper discusses potential mitigations and emphasizes the importance of transparency regarding API caching policies.

该研究论文探讨了大型语言模型API中的提示缓存机制，指出这种优化方式可能导致与数据相关的时序差异，从而被利用于侧信道攻击。通过对多个现实世界API进行统计审计，作者在包括OpenAI在内的多个服务提供商中发现了全局缓存共享的现象，这可能带来隐私风险，使攻击者有可能推测其他用户的提示内容。此外，研究还展示了时序差异如何泄露底层模型架构的细节，例如作者通过实验推断出OpenAI的嵌入模型很可能是一个仅使用解码器的Transformer架构。论文最后讨论了潜在的缓解措施，并强调在API缓存策略方面保持透明性的重要性。

原文链接：https://arxiv.org/abs/2502.07776

...more