Method Drift›KV-cache compression
RTN
KV-cache compression
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 4 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating RTN. Values are copied from the source paper's tables — verify against the cited paper.
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Llama-2-7B-chat, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
37.50 vs 6.76
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Llama-2-13B-chat, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
37.53 vs 10.15
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Mistral-7B, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
43.47 vs 15.58
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Mistral-7B-Instruct, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
46.23 vs 18.02
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [4bit, group-size 64]
4.60 vs 4.66
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [3bit, group-size 64]
4.63 vs 4.98
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [2bit, group-size 64]
4.87 vs 26.83
- WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
WindowQuant beats RTN · Avg [LLaVA-v1.5-7B]
48.5 vs 44.4
- Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [activations only, W 16-A 8]
83.7 vs 83.6
- Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [activations only, W 16-A 4]
82.9 vs 81.6
- Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [activations only, W 16-A 2]
35.9 vs 2.3
- Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [weights & activations, W 8-A 8]
83.7 vs 83.6
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- SpectrumKVSpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM ServingJun 7, 2026
- Hurwitz Quaternion Multiplicative Quantization (HQMQ)Hurwitz Quaternion Multiplicative Quantization for KV Cache CompressionMay 26, 2026
- May 18, 2026
- May 18, 2026
- TriAxialKVTriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference TasksMay 16, 2026
- KVServeKVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM ServingMay 13, 2026
- WindowQuantWindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference OptimizationMay 4, 2026
- Apr 21, 2026
- eOptShrinkQeOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and QuantizationApr 6, 2026
- Apr 3, 2026
- Mar 30, 2026
- Mar 29, 2026