Is Scissorhands superseded?

Scissorhands (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 8 paper(s) critique it, 3 beat it on benchmarks — #12 of 234 most-superseded. Sub-problem: cluster led by SnapKV. Newer alternatives in the same sub-problem include STaR-KV, GRKV, MomentKV, NestedKV, IndexMem.

Method Drift›KV-cache compression

Superseded baseline#12 of 234 most-superseded

Scissorhands

Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time

KV-cache compression · first seen May 26, 2023

superseded — cited as a baseline and beaten by newer methods

8 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Scissorhands as a baseline.

“Scissorhands retains only the KVs of recent tokens, sacrificing accuracy by discarding past context.”
— Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
“However, these methods often fall into 'local myopia' because they only rely on the recent window to evaluate importance.”
— Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
“They fix the budget of KV Cache in a finite level, but don't distinguish the differences between layers and between heads.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
“this method concentrates solely on the window of previous pivotal tokens in generation and neglects the extensive prompt that contains essential information for generating accurate responses”
— SnapKV: LLM Knows What You are Looking for Before Generation
“these methodologies can induce numerous issues as the context contained in the evicted KVs is discarded exhaustively.”
— No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
“Compared to H2O, Scissorhands discards as many tokens as possible from the KV cache in each round, rather than just one token.”
— KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
“Eviction-based approaches selectively retain critical cache entries using heuristics like attention scores and token positions, permanently discarding less critical entries xiao2024StreamingLLM,zhang2023h2o,reid2024RoCo,liu2024scissorhands,li2024SnapKV,yang2024pyramidinfer and thus causing context loss and potential hallucinations zhang2024cam.”
— KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
“Other methods like H$_2$O zhang2023h2o and Scissorhands liu2023scissorhands leverage the attention to compress the KV cache. However, they treat the compression of different layers as the same thing and can not compress in the prefill phase.”
— PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

Beaten on benchmarks

Head-to-head results where a newer method reports beating Scissorhands. Values are copied from the source paper's tables — verify against the cited paper.

NACL beats Scissorhands · Average [30% KV cache budget]
31.0 vs 26.7
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
NACL beats Scissorhands · Average [20% KV cache budget]
30.8 vs 21.9
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
NACL beats Scissorhands · Average [10% KV cache budget]
29.4 vs 14.9
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
CONF-KV-L beats Scissorhands · PPL [GPT-2, 2048 generated tokens, matched memory ~38.7 MB]
30.48 vs 31.94
CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, HellaSwag-Zero shot]
7.477 vs 3
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, PIQA-Zero shot]
7.477 vs 5
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, MathQA-Zero shot]
6.036 vs 5
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, HellaSwag-Zero shot]
8.394 vs 5
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, PIQA-Zero shot]
9.024 vs 5
QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, MathQA-Zero shot]
6.056 vs 5
QAQ: Quality Adaptive Quantization for LLM KV Cache

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.