Method Drift›KV-cache compression
Scissorhands
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test TimeKV-cache compression · first seen May 26, 2023
superseded — cited as a baseline and beaten by newer methods
8 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Scissorhands as a baseline.
“Scissorhands retains only the KVs of recent tokens, sacrificing accuracy by discarding past context.”
— Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs“However, these methods often fall into 'local myopia' because they only rely on the recent window to evaluate importance.”
— Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression“They fix the budget of KV Cache in a finite level, but don't distinguish the differences between layers and between heads.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation“this method concentrates solely on the window of previous pivotal tokens in generation and neglects the extensive prompt that contains essential information for generating accurate responses”
— SnapKV: LLM Knows What You are Looking for Before Generation“these methodologies can induce numerous issues as the context contained in the evicted KVs is discarded exhaustively.”
— No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization“Compared to H2O, Scissorhands discards as many tokens as possible from the KV cache in each round, rather than just one token.”
— KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing“Eviction-based approaches selectively retain critical cache entries using heuristics like attention scores and token positions, permanently discarding less critical entries xiao2024StreamingLLM,zhang2023h2o,reid2024RoCo,liu2024scissorhands,li2024SnapKV,yang2024pyramidinfer and thus causing context loss and potential hallucinations zhang2024cam.”
— KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference“Other methods like H$_2$O zhang2023h2o and Scissorhands liu2023scissorhands leverage the attention to compress the KV cache. However, they treat the compression of different layers as the same thing and can not compress in the prefill phase.”
— PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Beaten on benchmarks
Head-to-head results where a newer method reports beating Scissorhands. Values are copied from the source paper's tables — verify against the cited paper.
- NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
NACL beats Scissorhands · Average [30% KV cache budget]
31.0 vs 26.7
- NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
NACL beats Scissorhands · Average [20% KV cache budget]
30.8 vs 21.9
- NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
NACL beats Scissorhands · Average [10% KV cache budget]
29.4 vs 14.9
- CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM
CONF-KV-L beats Scissorhands · PPL [GPT-2, 2048 generated tokens, matched memory ~38.7 MB]
30.48 vs 31.94
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, HellaSwag-Zero shot]
7.477 vs 3
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, PIQA-Zero shot]
7.477 vs 5
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-7B, MathQA-Zero shot]
6.036 vs 5
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, HellaSwag-Zero shot]
8.394 vs 5
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, PIQA-Zero shot]
9.024 vs 5
- QAQ: Quality Adaptive Quantization for LLM KV Cache
QAQ beats Scissorhands · Compression ratio with less than 1% acc. drop [LLaMA 2-13B, MathQA-Zero shot]
6.056 vs 5
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- STaR-KVSTaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJun 1, 2026
- May 29, 2026
- May 28, 2026
- May 26, 2026
- May 25, 2026
- CONF-KVCONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLMMay 24, 2026
- May 21, 2026
- May 12, 2026
- Global Retention-Based KV EvictionMake Each Token Count: Towards Improving Long-Context Performance with KV Cache EvictionMay 10, 2026
- ReST-KVReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal SmoothingMay 9, 2026
- May 8, 2026
- fixed-contract diagnosticWhen Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache CompressionMay 7, 2026