Is Eigen Attention superseded?

Eigen Attention (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 5 paper(s) critique it, 1 beat it on benchmarks — #30 of 234 most-superseded. Sub-problem: cluster led by Palu. Newer alternatives in the same sub-problem include ArborKV, RDKV, EchoKV, VQKV, Self-Indexing KVCache.

Method Drift›KV-cache compression

Superseded baseline#30 of 234 most-superseded

Eigen Attention

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression

KV-cache compression · first seen Aug 10, 2024

superseded — cited as a baseline and beaten by newer methods

5 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Eigen Attention as a baseline.

“EigenAttention~saxena2024eigen and Zack~zhang2024zack attempt to address this by incorporating both queries and keys in low-rank decompositions, yet their behavior largely resembles that of SVD-based methods that compress keys alone.”
— KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
“Others, like Eigen Attention saxena2024eigenattentionattentionlowrank, tackle the memory issue but require modifying model weights offline for a fixed compression level, sacrificing crucial runtime flexibility.”
— SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
“A key limitation of EigenAttention and MatryoshkaKV lies in their use of a static basis: once computed, the projections remain fixed throughout inference. This assumption breaks down when inference prompts diverge from the calibration distribution (e.g., shifting from conversational text to code), leading to degraded approximation and reduced generation quality.”
— OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
“Eigen Attention~saxena2024eigen has been proposed to compress the KV cache after applying RoPE but suffers a relatively large accuracy loss.”
— SALS: Sparse Attention in Latent Space for KV cache Compression
“they still optimize pre-softmax or intermediate proxies rather than the full decoder-layer output”
— Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold

Beaten on benchmarks

Head-to-head results where a newer method reports beating Eigen Attention. Values are copied from the source paper's tables — verify against the cited paper.

OjaKV beats Eigen Attention · Avg-Acc [Llama-2-7B 0.8x]
63.57 vs 61.98
OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule
OjaKV beats Eigen Attention · Avg-Acc [Llama-3.1-8B 0.8x]
69.34 vs 68.83
OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja's Rule

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.