Method Drift›Speculative decoding
SpecVLM
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token PruningSpeculative decoding · first seen Aug 22, 2025
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SpecVLM as a baseline.
“Their experiments with a small VLM draft model incorporating an image encoder yielded only marginal gains, highlighting the challenge of effectively processing visual information in the draft model due to the high redundancy and computational complexity of image inputs.”
— ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding“However, existing SD frameworks are fundamentally constrained by their exact-match rule: a draft token is accepted only if it is identical to the target model's generation.”
— See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
Beaten on benchmarks
Head-to-head results where a newer method reports beating SpecVLM. Values are copied from the source paper's tables — verify against the cited paper.
- See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
\method (Ours) beats SpecVLM · Speedup [Std.-SD Qwen2.5-VL]
2.70 vs 2.00
- See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
\method (Ours) beats SpecVLM · Speedup [Self-SD Qwen2.5-VL]
1.77 vs 1.53
- See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video LLMs
\method (Ours) beats SpecVLM · Speedup [Std.-SD LLaVA-OV]
2.94 vs 2.38
- FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
DFlash beats SpecVLM · Speed-up ratio [LLaVA-1.5, tau=0]
1.83 vs 1.46
- FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
DFlash beats SpecVLM · Speed-up ratio [LLaVA-1.5, tau=1]
1.81 vs 1.40
- FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
DFlash beats SpecVLM · Speed-up ratio [QwenVL-2.5, tau=0]
2.68 vs 1.62
- FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
DFlash beats SpecVLM · Speed-up ratio [QwenVL-2.5, tau=1]
2.05 vs 1.58
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- DREAM-SDREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal GenerationMay 30, 2026
- May 14, 2026
- SpecForgeSpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative DecodingMar 19, 2026
- Mar 13, 2026
- Feb 17, 2026
- Oct 22, 2025
- Oct 22, 2025
- Oct 17, 2025
- Draft, Verify, & Improve (DVI)Draft, Verify, and Improve: Toward Training-Aware Speculative DecodingOct 6, 2025
- FastGRPOFastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft LearningSep 26, 2025