Is Speculative Sampling superseded?

Speculative Sampling (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 10 beat it on benchmarks — #9 of 151 most-superseded. Sub-problem: cluster led by Lookahead. Newer alternatives in the same sub-problem include Collaborative Speculative Decoding (CoSpec), ToolSpec, Quasar, FLy (Training-Free Loosely Speculative Decoding), Pivot-Aware Speculative Decoding.

Method Drift›Speculative decoding

Superseded baseline#9 of 151 most-superseded

Speculative Sampling

Speculative Sampling for Parametric Temporal Point Processes

Speculative decoding · first seen Oct 22, 2025

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 10 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Speculative Sampling as a baseline.

“vanilla speculative decoding often suffers from high drafting latency, which can account for over 75\% and 60\% of the total inference time when using Qwen3-1.7B to accelerate Qwen3-14B and Qwen3-32B with draft length 5.”
— DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
“The draft model with limited capacity struggles to precisely approximate the large-scale target model.”
— EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Beaten on benchmarks

Head-to-head results where a newer method reports beating Speculative Sampling. Values are copied from the source paper's tables — verify against the cited paper.

+ Reflect Verify beats Speculative Sampling · Speed [Llama3.2-1B-Instruct & Llama3.1-8B-Instruct]
1.36 vs 1.18
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
+ Reflect Verify beats Speculative Sampling · Speed [Llama3.1-8B-Instruct & Llama3.1-70B-Instruct]
2.24 vs 2.08
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
ECHO beats Speculative Sampling · Avg. Speedup [Vicuna-13B]
5.25 vs 1.76
ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B GSM8K]
2.24 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B HumanEval]
2.40 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B MBPP]
2.39 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B MATH]
2.27 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B GSM8K]
2.37 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B HumanEval]
2.29 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B MBPP]
2.30 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B MATH]
2.28 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-1.5-8B GSM8K]
2.40 vs 1
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.