Method Drift›KV-cache compression
Fast-dLLM
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel DecodingKV-cache compression · first seen May 28, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Fast-dLLM as a baseline.
“although Fast-dLLM wu2025 and Elastic-Cache Tri2025 can provide certain acceleration benefits for short-text tasks, their throughput rapidly deteriorates as context length grows, often accompanied by further reductions in accuracy.”
— WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering
Beaten on benchmarks
Head-to-head results where a newer method reports beating Fast-dLLM. Values are copied from the source paper's tables — verify against the cited paper.
- Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction
Sparse-dLLM (ours) beats Fast-dLLM · Throughput (TPS) [LLaDA-8B-Instruct]
3.4 vs 2.2
- Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction
Sparse-dLLM (ours) beats Fast-dLLM · Throughput (TPS) [Dream-v0-7B-Instruct]
3.6 vs 2.3
- Attention Is All You Need for KV Cache in Diffusion LLMs
Elastic-Cache beats Fast-dLLM · speedup vs baseline [GSM8K 5-shot, 512 Gen Length, Confident-Aware Decoding]
25.2 vs 12.3
- Attention Is All You Need for KV Cache in Diffusion LLMs
Elastic-Cache beats Fast-dLLM · speedup vs baseline [MATH 4-shot, 512 Gen Length, Confident-Aware Decoding]
7.9 vs 7.4
- Attention Is All You Need for KV Cache in Diffusion LLMs
Elastic-Cache beats Fast-dLLM · speedup vs baseline [HumanEval 0-shot, 512 Gen Length, Confident-Aware Decoding]
5.0 vs 4.3
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.