Method Drift›KV-cache compression
Superseded baseline#62 of 234 most-superseded
Lexico
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal DictionariesKV-cache compression · first seen Dec 12, 2024
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Lexico as a baseline.
“Frameworks such as Lexico kim2024lexicoextremekvcache introduce significant latency by relying on separate compression and decompression steps at every single decoding stage.”
— SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression“unlike Lexico's uniform compression, we leverage the Semantic Elbow and Key-Value Asymmetry to dynamically allocate budgets---heavily compressing sparse routing information while preserving dense semantic content”
— Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders“Unfortunately, this approach requires solving a computationally expensive matching pursuit algorithm for each key and value embedding, making Lexico relatively slow.”
— PolarQuant: Quantizing KV Caches with Polar Transformation
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 21, 2026
- May 8, 2026
- Mar 24, 2026
- Mar 17, 2026
- Mar 15, 2026
- Feb 5, 2026
- Jan 29, 2026
- GPU-ccelerated INT8 quantization for KV cache compressionGPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language ModelsJan 8, 2026
- STA-AttentionUnlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse AutoencodersDec 11, 2025
- SWANSWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache CompressionNov 24, 2025
- Oct 28, 2025
- Sep 25, 2025