Method Drift›Speculative decoding
DFlash
DFlash: Block Diffusion for Flash Speculative DecodingSpeculative decoding · first seen Feb 5, 2026
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites DFlash as a baseline.
“Limitation. DFlash acceptance length is fundamentally bounded by the longest correct prefix: once a mismatch occurs at some position, all subsequent draft tokens are discarded regardless of their quality.”
— D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models“However, these one-shot drafters have a fundamental limitation: the predicted distribution for draft token $x_{t+i}$ is conditioned only on the prefix context $x_{ t}$, with no dependence on preceding drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to increasingly diverge from the verifier's true autoregressive distribution as draft depth grows.”
— TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding“Vanilla DFlash, however, explores only one continuation per round.”
— Accelerating Speculative Decoding with Block Diffusion Draft Trees“While recent approaches such as DFlash and DART mitigate this issue with position-aware decaying weights, their weights are fixed and primarily position-dependent.”
— PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating DFlash. Values are copied from the source paper's tables — verify against the cited paper.
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=2 [Qwen3-4B, GSM8K]
1256 vs 965
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-4B, GSM8K]
2202 vs 1698
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=8 [Qwen3-4B, GSM8K]
3441 vs 2738
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=16 [Qwen3-4B, GSM8K]
4467 vs 3538
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=32 [Qwen3-4B, GSM8K]
5509 vs 4397
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=2 [Qwen3-4B, MBPP]
968 vs 914
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-4B, MBPP]
1654 vs 1650
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=8 [Qwen3-4B, MBPP]
2651 vs 2501
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=16 [Qwen3-4B, MBPP]
3422 vs 3330
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=32 [Qwen3-4B, MBPP]
4290 vs 4088
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=2 [Qwen3-8B, GSM8K]
942 vs 672
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-8B, GSM8K]
1703 vs 1243
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 3, 2026
- Jun 2, 2026
- Hybrid Verified DecodingHybrid Verified Decoding: Learning to Allocate Verification in Speculative DecodingMay 31, 2026
- May 28, 2026
- May 28, 2026
- May 28, 2026
- May 19, 2026
- May 19, 2026
- May 9, 2026
- May 8, 2026
- May 1, 2026
- Apr 21, 2026