Is SparseVLM superseded?

SparseVLM (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 1 paper(s) critique it, 2 beat it on benchmarks — #53 of 234 most-superseded. Sub-problem: cluster led by ReKV. Newer alternatives in the same sub-problem include MuKV, KVCapsule, Decoupled Streaming Cache (DSCache), LightKV, Hierarchical Adaptive Eviction (HAE).

Method Drift›KV-cache compression

Superseded baseline#53 of 234 most-superseded

SparseVLM

KV-cache compression

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites SparseVLM as a baseline.

“However, these strategies mainly tackle redundancy within visual tokens without addressing the interdependencies between text and visual tokens during multimodal long text generation~huang2025dynamicllava.”
— Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating SparseVLM. Values are copied from the source paper's tables — verify against the cited paper.

HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · GQA [LLaVA-1.5-7B with 192 visual tokens retained]
61.7 vs 57.6
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · MMB [LLaVA-1.5-7B with 192 visual tokens retained]
64.6 vs 62.5
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · SQA [LLaVA-1.5-7B with 192 visual tokens retained]
69.4 vs 69.1
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · VQA2 [LLaVA-1.5-7B with 192 visual tokens retained]
78.1 vs 75.6
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
HAE-LLaVA (Ours, Retain, 192) beats SparseVLM · TextVQA [LLaVA-1.5-7B with 192 visual tokens retained]
57.9 vs 56.1
Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models
LightVLM beats SparseVLM · Avg [Keep 35% image tokens]
65.3 vs 62.5
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 15% image tokens]
64.6 vs 57.6
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 3% image tokens]
63.9 vs 51.2
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 35% video tokens]
68.3 vs 65.5
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 15% video tokens]
67.8 vs 63.2
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [Keep 3% video tokens]
67.1 vs 54.9
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM beats SparseVLM · Avg [LLaVA Onevision 7B]
64.4 vs 61.8
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.