Is DFlash superseded?

DFlash (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 4 paper(s) critique it, 6 beat it on benchmarks — #10 of 151 most-superseded. Sub-problem: cluster led by EAGLE-3. Newer alternatives in the same sub-problem include D^2SD, TreeFlash, Hybrid Verified Decoding, Bastion, Draft-OPD.

Method Drift›Speculative decoding

Superseded baseline#10 of 151 most-superseded

DFlash

DFlash: Block Diffusion for Flash Speculative Decoding

Speculative decoding · first seen Feb 5, 2026

superseded — cited as a baseline and beaten by newer methods

4 papers critique it · 6 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites DFlash as a baseline.

“Limitation. DFlash acceptance length is fundamentally bounded by the longest correct prefix: once a mismatch occurs at some position, all subsequent draft tokens are discarded regardless of their quality.”
— D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models
“However, these one-shot drafters have a fundamental limitation: the predicted distribution for draft token $x_{t+i}$ is conditioned only on the prefix context $x_{ t}$, with no dependence on preceding drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to increasingly diverge from the verifier's true autoregressive distribution as draft depth grows.”
— TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding
“Vanilla DFlash, however, explores only one continuation per round.”
— Accelerating Speculative Decoding with Block Diffusion Draft Trees
“While recent approaches such as DFlash and DART mitigate this issue with position-aware decaying weights, their weights are fixed and primarily position-dependent.”
— PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

Beaten on benchmarks

Head-to-head results where a newer method reports beating DFlash. Values are copied from the source paper's tables — verify against the cited paper.

Domino beats DFlash · TPS at concurrency=2 [Qwen3-4B, GSM8K]
1256 vs 965
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-4B, GSM8K]
2202 vs 1698
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=8 [Qwen3-4B, GSM8K]
3441 vs 2738
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=16 [Qwen3-4B, GSM8K]
4467 vs 3538
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=32 [Qwen3-4B, GSM8K]
5509 vs 4397
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=2 [Qwen3-4B, MBPP]
968 vs 914
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-4B, MBPP]
1654 vs 1650
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=8 [Qwen3-4B, MBPP]
2651 vs 2501
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=16 [Qwen3-4B, MBPP]
3422 vs 3330
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=32 [Qwen3-4B, MBPP]
4290 vs 4088
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=2 [Qwen3-8B, GSM8K]
942 vs 672
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats DFlash · TPS at concurrency=4 [Qwen3-8B, GSM8K]
1703 vs 1243
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.