Method Drift›KV-cache compression
SparseVLM
KV-cache compression
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SparseVLM as a baseline.
“However, these strategies mainly tackle redundancy within visual tokens without addressing the interdependencies between text and visual tokens during multimodal long text generation~huang2025dynamicllava.”
— Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating SparseVLM. Values are copied from the source paper's tables — verify against the cited paper.
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · GQA [LLaVA-1.5-7B with 192 visual tokens retained]
61.7 vs 57.6
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · MMB [LLaVA-1.5-7B with 192 visual tokens retained]
64.6 vs 62.5
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · SQA [LLaVA-1.5-7B with 192 visual tokens retained]
69.4 vs 69.1
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · VQA2 [LLaVA-1.5-7B with 192 visual tokens retained]
78.1 vs 75.6
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · TextVQA [LLaVA-1.5-7B with 192 visual tokens retained]
57.9 vs 56.1
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 35% image tokens]
65.3 vs 62.5
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 15% image tokens]
64.6 vs 57.6
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 3% image tokens]
63.9 vs 51.2
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 35% video tokens]
68.3 vs 65.5
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 15% video tokens]
67.8 vs 63.2
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 3% video tokens]
67.1 vs 54.9
- LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [LLaVA Onevision 7B]
64.4 vs 61.8
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 21, 2026
- KVCapsuleKVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric RedundancyMay 14, 2026
- Decoupled Streaming Cache (DSCache)Decouple and Cache: KV Cache Construction for Streaming Video UnderstandingMay 3, 2026
- May 1, 2026
- Hierarchical Adaptive Eviction (HAE)Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language ModelsFeb 2, 2026
- Dec 13, 2025
- StreamKVStreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and CompressionNov 10, 2025