Method Drift›KV-cache compression
Superseded baseline#30 of 234 most-superseded
Eigen Attention
Eigen Attention: Attention in Low-Rank Space for KV Cache CompressionKV-cache compression · first seen Aug 10, 2024
superseded — cited as a baseline and beaten by newer methods
5 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Eigen Attention as a baseline.
“EigenAttention~saxena2024eigen and Zack~zhang2024zack attempt to address this by incorporating both queries and keys in low-rank decompositions, yet their behavior largely resembles that of SVD-based methods that compress keys alone.”
— KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity“Others, like Eigen Attention saxena2024eigenattentionattentionlowrank, tackle the memory issue but require modifying model weights offline for a fixed compression level, sacrificing crucial runtime flexibility.”
— SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression“A key limitation of EigenAttention and MatryoshkaKV lies in their use of a static basis: once computed, the projections remain fixed throughout inference. This assumption breaks down when inference prompts diverge from the calibration distribution (e.g., shifting from conversational text to code), leading to degraded approximation and reduced generation quality.”
— OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule“Eigen Attention~saxena2024eigen has been proposed to compress the KV cache after applying RoPE but suffers a relatively large accuracy loss.”
— SALS: Sparse Attention in Latent Space for KV cache Compression“they still optimize pre-softmax or intermediate proxies rather than the full decoder-layer output”
— Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold
Beaten on benchmarks
Head-to-head results where a newer method reports beating Eigen Attention. Values are copied from the source paper's tables — verify against the cited paper.
- OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
OjaKV beats Eigen Attention · Avg-Acc [Llama-2-7B 0.8x]
63.57 vs 61.98
- OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
OjaKV beats Eigen Attention · Avg-Acc [Llama-3.1-8B 0.8x]
69.34 vs 68.83
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 21, 2026
- May 8, 2026
- Mar 24, 2026
- Mar 17, 2026
- Mar 15, 2026
- Feb 5, 2026
- Jan 29, 2026
- GPU-ccelerated INT8 quantization for KV cache compressionGPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language ModelsJan 8, 2026
- STA-AttentionUnlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse AutoencodersDec 11, 2025
- SWANSWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache CompressionNov 24, 2025
- Oct 28, 2025
- Sep 25, 2025