RTN (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 0 paper(s) critique it, 4 beat it on benchmarks — #39 of 234 most-superseded. Sub-problem: cluster led by KIVI. Newer alternatives in the same sub-problem include SpectrumKV, Hurwitz Quaternion Multiplicative Quantization (HQMQ), OSCAR, OScaR, TriAxialKV.

Superseded baseline#39 of 234 most-superseded

RTN

KV-cache compression

superseded — cited as a baseline and beaten by newer methods

0 papers critique it · 4 beat it on benchmarks

Beaten on benchmarks

Head-to-head results where a newer method reports beating RTN. Values are copied from the source paper's tables — verify against the cited paper.

SKVQ beats RTN · Average [Llama-2-7B-chat, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
37.50 vs 6.76
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Llama-2-13B-chat, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
37.53 vs 10.15
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Mistral-7B, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
43.47 vs 15.58
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · Average [Mistral-7B-Instruct, Group-size 128, key-cache 2bit, value-cache 2bit, window-size 128]
46.23 vs 18.02
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [4bit, group-size 64]
4.60 vs 4.66
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [3bit, group-size 64]
4.63 vs 4.98
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
SKVQ beats RTN · PPL [2bit, group-size 64]
4.87 vs 26.83
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
WindowQuant beats RTN · Avg [LLaVA-v1.5-7B]
48.5 vs 44.4
WindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference Optimization
DecoQuant beats RTN · Average [activations only, W 16-A 8]
83.7 vs 83.6
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [activations only, W 16-A 4]
82.9 vs 81.6
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [activations only, W 16-A 2]
35.9 vs 2.3
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
DecoQuant beats RTN · Average [weights & activations, W 8-A 8]
83.7 vs 83.6
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.