Method Drift›KV-cache compression
ShadowKV
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM InferenceKV-cache compression · first seen Oct 28, 2024
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ShadowKV as a baseline.
“However, limited by the accuracy constraints of intra-layer SVD, ShadowKV is forced to offload Value states to the CPU, leaving inference speed bounded by PCIe bandwidth.”
— xKV: Cross-Layer SVD for KV-Cache Compression“We systematically evaluate KV offloading methods on context-intensive tasks and observe significant accuracy drops”
— KV Cache Offloading for Context-Intensive Tasks“ShadowKV does not support long-generation since the SVD is performed only once during prefill, leaving the low-rank key unupdated during decoding.”
— FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference“Unfortunately, their coarse-grained retrieval strategies often overlook fine-grained dependencies and incur high I/O overhead.”
— HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
Beaten on benchmarks
Head-to-head results where a newer method reports beating ShadowKV. Values are copied from the source paper's tables — verify against the cited paper.
- xKV: Cross-Layer SVD for KV-Cache Compression
xKSR (Ours) beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 1.59 (7.76)]
89.97 vs 88.80
- xKV: Cross-Layer SVD for KV-Cache Compression
xKSR (Ours) beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 1.63 (8.90)]
89.70 vs 87.17
- xKV: Cross-Layer SVD for KV-Cache Compression
xKSR (Ours) beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 1.68 (10.45)]
88.34 vs 64.91
- xKV: Cross-Layer SVD for KV-Cache Compression
xKVSR (Ours) beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 4.37]
89.83 vs 86.32
- xKV: Cross-Layer SVD for KV-Cache Compression
xKVSR (Ours) beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 5.35]
89.69 vs 70.94
- xKV: Cross-Layer SVD for KV-Cache Compression
xKSR beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 1.68 (10.45)]
42.50 vs 40.51
- xKV: Cross-Layer SVD for KV-Cache Compression
xKVSR beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 1.63 (8.90)]
42.69 vs 42.21
- xKV: Cross-Layer SVD for KV-Cache Compression
xKVSR beats ShadowKV · Avg. [Llama-3.1-8B-Instruct, Comp. 5.35]
42.40 vs 41.51
- KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference
KVDrive beats ShadowKV · Avg (RULER) [Qwen-3-8B]
68.07 vs 67.03
- SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
SVDq+Sparsity beats ShadowKV · Average [Qwen2.5-14B-Instruct]
73.1 vs 72.6
- SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
SVDq+Sparsity beats ShadowKV · Average [Qwen2.5-7B-Instruct]
66.8 vs 63.6
- SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
SVDq+Sparsity beats ShadowKV · Average [Qwen2.5-3B-Instruct]
55.5 vs 51.8
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 28, 2026
- May 18, 2026
- LouverSparse Attention as a Range Searching Problem: Towards an Inference-Efficient Index for KV CacheMay 7, 2026
- Apr 12, 2026
- ScoutAttentionScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM InferenceMar 28, 2026
- DynSplit-KVDynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM InferenceFeb 3, 2026
- HeteroCacheHeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM InferenceJan 20, 2026
- Dec 11, 2025
- CLOCLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-DesignNov 18, 2025
- Oct 13, 2025