CLCRLGFeb 11, 2025

Auditing Prompt Caching in Language Model APIs

arXiv:2502.07776v217 citationsh-index: 12ICML
Originality Incremental advance
AI Analysis

This work addresses privacy and transparency issues for users of LLM APIs by exposing vulnerabilities in caching policies, though it is incremental in building on known timing attack risks.

The paper tackled the problem of prompt caching in LLM APIs causing data-dependent timing variations that risk side-channel attacks, and through statistical audits, detected global cache sharing across seven API providers including OpenAI, revealing potential privacy leakage and previously unknown model architecture details.

Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes