Palu (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 4 paper(s) critique it, 6 beat it on benchmarks — #13 of 234 most-superseded. Sub-problem: cluster led by Palu. Newer alternatives in the same sub-problem include ArborKV, RDKV, EchoKV, VQKV, Self-Indexing KVCache.

Superseded baseline#13 of 234 most-superseded

Palu

Palu: Compressing KV-Cache with Low-Rank Projection

KV-cache compression · first seen Jul 30, 2024

superseded — cited as a baseline and beaten by newer methods

4 papers critique it · 6 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Palu as a baseline.

“Palu~chang2024palucompressingkvcachelowrank and ReCalKV~yan2025recalkv factorize the model weights into low-rank matrices, cache compressed intermediate states, and reconstruct the full key and value tensors during attention. However, these methods often incur noticeable accuracy degradation due to lossy factorization.”
— OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
“However, as Palu~chang2024palu points out, this will greatly introduce additional computation for recovering the key vectors.”
— SALS: Sparse Attention in Latent Space for KV cache Compression
“A key limitation is that reconstruction error is only an indirect proxy for attention and downstream layer behavior, and accuracy can degrade more sharply at higher compression.”
— Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold
“However, this approach targets only the projection weights, while prior work yu2023compressing has shown that transformer weights typically have higher rank than the output features (keys/values), suggesting that data-dependent KV-cache compression is more effective.”
— KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs

Beaten on benchmarks

Head-to-head results where a newer method reports beating Palu. Values are copied from the source paper's tables — verify against the cited paper.

ReCalKV beats Palu · Average [LLaMA-7B, 50% compression]
62.54 vs 62.23
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-7B, 60% compression]
60.44 vs 58.15
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-7B, 70% compression]
58.79 vs 53.57
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 50% compression]
63.64 vs 62.64
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 60% compression]
61.97 vs 58.85
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV beats Palu · Average [LLaMA-2-7B, 70% compression]
29.62 vs 13.26
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.3]
72.31 vs 72.14
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.5]
71.59 vs 68.79
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Llama3.1-8B-Instruct, ratio 0.6]
68.19 vs 50.59
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.3]
71.84 vs 71.39
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.5]
70.57 vs 68.21
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats Palu · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.6]
68.61 vs 63.07
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.