CRLGOSAug 11, 2025

Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference

arXiv:2508.08438v114 citationsh-index: 6
Originality Highly original
AI Analysis

This addresses privacy risks for LLM users from timing side-channels while maintaining performance, representing a novel hybrid approach rather than incremental.

The paper tackles the problem of timing side-channel attacks in LLM inference caused by global KV-cache sharing, introducing SafeKV which selectively shares non-sensitive cache entries while confining sensitive content to private caches. The result shows SafeKV mitigates 94%-97% of attacks, improves time-to-first-token by up to 40.58% compared to per-user isolation, and reduces cache-induced TTFT overhead from 50.41% to 11.74% on Qwen3-235B.

Global KV-cache sharing has emerged as a key optimization for accelerating large language model (LLM) inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolation, eliminate leakage but degrade performance by up to 38.9% in time-to-first-token (TTFT), making them impractical for high-throughput deployment. To address this gap, we introduce SafeKV (Secure and Flexible KV Cache Sharing), a privacy-aware KV-cache management framework that selectively shares non-sensitive entries while confining sensitive content to private caches. SafeKV comprises three components: (i) a hybrid, multi-tier detection pipeline that integrates rule-based pattern matching, a general-purpose privacy detector, and context-aware validation; (ii) a unified radix-tree index that manages public and private entries across heterogeneous memory tiers (HBM, DRAM, SSD); and (iii) entropy-based access monitoring to detect and mitigate residual information leakage. Our evaluation shows that SafeKV mitigates 94% - 97% of timing-based side-channel attacks. Compared to per-user isolation method, SafeKV improves TTFT by up to 40.58% and throughput by up to 2.66X across diverse LLMs and workloads. SafeKV reduces cache-induced TTFT overhead from 50.41% to 11.74% on Qwen3-235B. By combining fine-grained privacy control with high cache reuse efficiency, SafeKV reclaims the performance advantages of global sharing while providing robust runtime privacy guarantees for LLM inference.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes