Method Drift›KV-cache compression
Palu
Palu: Compressing KV-Cache with Low-Rank ProjectionKV-cache compression · first seen Jul 30, 2024
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Palu as a baseline.
“Palu~chang2024palucompressingkvcachelowrank and ReCalKV~yan2025recalkv factorize the model weights into low-rank matrices, cache compressed intermediate states, and reconstruct the full key and value tensors during attention. However, these methods often incur noticeable accuracy degradation due to lossy factorization.”
— OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule“However, as Palu~chang2024palu points out, this will greatly introduce additional computation for recovering the key vectors.”
— SALS: Sparse Attention in Latent Space for KV cache Compression“A key limitation is that reconstruction error is only an indirect proxy for attention and downstream layer behavior, and accuracy can degrade more sharply at higher compression.”
— Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold“However, this approach targets only the projection weights, while prior work yu2023compressing has shown that transformer weights typically have higher rank than the output features (keys/values), suggesting that data-dependent KV-cache compression is more effective.”
— KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs
Beaten on benchmarks
Head-to-head results where a newer method reports beating Palu. Values are copied from the source paper's tables — verify against the cited paper.
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-7B, 50% compression]
62.54 vs 62.23
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-7B, 60% compression]
60.44 vs 58.15
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-7B, 70% compression]
58.79 vs 53.57
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 50% compression]
63.64 vs 62.64
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 60% compression]
61.97 vs 58.85
- ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 70% compression]
29.62 vs 13.26
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.3]
72.31 vs 72.14
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.5]
71.59 vs 68.79
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.6]
68.19 vs 50.59
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.3]
71.84 vs 71.39
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.5]
70.57 vs 68.21
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.6]
68.61 vs 63.07
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 21, 2026
- May 8, 2026
- Mar 24, 2026
- Mar 17, 2026
- Mar 15, 2026
- Feb 5, 2026
- Jan 29, 2026
- GPU-ccelerated INT8 quantization for KV cache compressionGPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language ModelsJan 8, 2026
- STA-AttentionUnlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse AutoencodersDec 11, 2025
- SWANSWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache CompressionNov 24, 2025
- Oct 28, 2025
- Sep 25, 2025