Method Drift›Speculative decoding
Speculative Sampling
Speculative Sampling for Parametric Temporal Point ProcessesSpeculative decoding · first seen Oct 22, 2025
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 10 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Speculative Sampling as a baseline.
“vanilla speculative decoding often suffers from high drafting latency, which can account for over 75\% and 60\% of the total inference time when using Qwen3-1.7B to accelerate Qwen3-14B and Qwen3-32B with draft length 5.”
— DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference“The draft model with limited capacity struggles to precisely approximate the large-scale target model.”
— EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Beaten on benchmarks
Head-to-head results where a newer method reports beating Speculative Sampling. Values are copied from the source paper's tables — verify against the cited paper.
- Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
+ Reflect Verify beats Speculative Sampling · Speed [Llama3.2-1B-Instruct & Llama3.1-8B-Instruct]
1.36 vs 1.18
- Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
+ Reflect Verify beats Speculative Sampling · Speed [Llama3.1-8B-Instruct & Llama3.1-70B-Instruct]
2.24 vs 2.08
- ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios
ECHO beats Speculative Sampling · Avg. Speedup [Vicuna-13B]
5.25 vs 1.76
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B GSM8K]
2.24 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B HumanEval]
2.40 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B MBPP]
2.39 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Base-8B MATH]
2.27 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B GSM8K]
2.37 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B HumanEval]
2.29 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B MBPP]
2.30 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-Instruct-8B MATH]
2.28 vs 1
- Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Spiffy beats Speculative Sampling · Speedup [LLaDA-1.5-8B GSM8K]
2.40 vs 1
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Collaborative Speculative Decoding (CoSpec)Beyond the Target: From Imitation to Collaboration in Speculative DecodingMay 24, 2026
- ToolSpecToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative DecodingApr 15, 2026
- QuasarQuasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient VerificationMar 2, 2026
- FLy (Training-Free Loosely Speculative Decoding)Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact MatchNov 28, 2025
- Nov 1, 2025
- Oct 30, 2025
- Oct 22, 2025
- Oct 8, 2025
- Group Tree Optimization (GTO)Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative DecodingSep 26, 2025
- Sep 22, 2025