Living systematic review
Speculative decoding
Speeding up autoregressive LLM generation by drafting tokens cheaply and verifying them in parallel.
182 papers · 333 critique receipts · 1,849 benchmark results · updated Jun 18, 2026
Most-superseded baselines
Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.
- 1EAGLE-3· EAGLE-3EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
13 papers critique it · 28 beat it on benchmarks
- 2EAGLE-2· EAGLE-2EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
10 papers critique it · 21 beat it on benchmarks
- 3EAGLE· EAGLE-2EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
19 papers critique it · 12 beat it on benchmarks
- 4Medusa· EAGLE-2Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
19 papers critique it · 9 beat it on benchmarks
- 5Lookahead· LookaheadLookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
8 papers critique it · 13 beat it on benchmarks
- 6PLD· LookaheadPLD+: Accelerating LLM inference by leveraging Language Model Artifacts
7 papers critique it · 13 beat it on benchmarks
- 7REST· LookaheadREST: Retrieval-Based Speculative Decoding
6 papers critique it · 9 beat it on benchmarks
- 8SpecInfer· SpecInferSpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
10 papers critique it · 4 beat it on benchmarks
- 9Speculative Sampling· LookaheadSpeculative Sampling for Parametric Temporal Point Processes
2 papers critique it · 10 beat it on benchmarks
- 10DFlash· EAGLE-3DFlash: Block Diffusion for Flash Speculative Decoding
4 papers critique it · 6 beat it on benchmarks
- 11LayerSkip· SpecInferLayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
7 papers critique it · 3 beat it on benchmarks
- 12FR-Spec· FR-SpecFR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
4 papers critique it · 5 beat it on benchmarks
Sub-problems
Methods that compete on the same benchmarks cluster into distinct sub-problems.
Lookahead · 47 methods
Lookahead · PLD · REST · Speculative Sampling · Token Recycling · PEARL
RSD · 8 methods
RSD · EARS · EASD · trained evaluation models in speculative decoding · From Tokens to Steps · Entropy-Aware Speculative Decoding (EASD)
SpecReason · 7 methods
SpecReason · LLM-as-a-judge for sequence-level verification · token-level speculative decoding · SpecThinking · SpecSampling · Lookahead Reasoning
The frontier
Recent methods not yet superseded in the knowledge base.
- Jun 4, 2026
- Jun 3, 2026
- Jun 2, 2026
- Hybrid Verified DecodingHybrid Verified Decoding: Learning to Allocate Verification in Speculative DecodingMay 31, 2026
- DREAM-SDREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal GenerationMay 30, 2026
- May 29, 2026
- May 28, 2026
- May 28, 2026
- May 28, 2026
- Collaborative Speculative Decoding (CoSpec)Beyond the Target: From Imitation to Collaboration in Speculative DecodingMay 24, 2026
- May 19, 2026
- May 19, 2026