Method Drift›KV-cache compression
ThinK
ThinK: Thinner Key Cache by Query-Driven PruningKV-cache compression · first seen Jul 30, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites ThinK as a baseline.
“previous works on KV cache pruning have been limited to structured pruning, primarily due to the difficulty of efficiently leveraging finer-grained (i.e., unstructured) sparsity during execution.”
— Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference“direct truncation of the original channels, as exemplified by ThinK, leads to significant performance degradation when pursuing high compression ratios.”
— SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention
Beaten on benchmarks
Head-to-head results where a newer method reports beating ThinK. Values are copied from the source paper's tables — verify against the cited paper.
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats ThinK · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.3]
71.84 vs 70.72
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.5 V0.5 beats ThinK · Avg [Llama-3-8B-Instruct, K0.5 V0.5 vs ThinK0.5]
42.65 vs 38.53
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.7 V0.7 beats ThinK · Avg [Llama-3-8B-Instruct, K0.7 V0.7 vs ThinK0.7]
40.96 vs 26.55
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.5 V0.5 beats ThinK · Avg [Mistral-7B-Instruct-v0.2, K0.5 V0.5 vs ThinK0.5]
42.30 vs 39.46
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.7 V0.7 beats ThinK · Avg [Mistral-7B-Instruct-v0.2, K0.7 V0.7 vs ThinK0.7]
40.95 vs 32.44
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.5 V0.5 beats ThinK · Avg [Llama-2-7B, K0.5 V0.5 vs ThinK0.5]
27.23 vs 25.69
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
K0.7 V0.7 beats ThinK · Avg [Llama-2-7B, K0.7 V0.7 vs ThinK0.7]
27.23 vs 21.57
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar beats ThinK · Needle-Single1 [Llama-3.1-8B-Instruct, Key 70%]
1.000 vs 0.448
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar beats ThinK · Needle-Single2 [Llama-3.1-8B-Instruct, Key 70%]
1.000 vs 0.490
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar beats ThinK · Needle-Single1 [Llama-3.1-8B-Instruct, Value 70%]
1.000 vs 0.948
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar beats ThinK · Needle-MultiKey1 [Llama-3.1-8B-Instruct, Value 70%]
1.000 vs 0.948
- Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar beats ThinK · Needle-Single1 [Llama-3.1-8B-Instruct, Key&Value 70%]
1.000 vs 0.000
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 21, 2026
- May 8, 2026
- Mar 24, 2026
- Mar 17, 2026
- Mar 15, 2026
- Feb 5, 2026
- Jan 29, 2026
- GPU-ccelerated INT8 quantization for KV cache compressionGPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language ModelsJan 8, 2026
- STA-AttentionUnlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse AutoencodersDec 11, 2025
- SWANSWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache CompressionNov 24, 2025
- Oct 28, 2025
- Sep 25, 2025