Living systematic review

Speculative decoding

Speeding up autoregressive LLM generation by drafting tokens cheaply and verifying them in parallel.

182 papers · 333 critique receipts · 1,849 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

1
EAGLE-3· EAGLE-3
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
13 papers critique it · 28 beat it on benchmarks
2
EAGLE-2· EAGLE-2
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
10 papers critique it · 21 beat it on benchmarks
3
EAGLE· EAGLE-2
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
19 papers critique it · 12 beat it on benchmarks
4
Medusa· EAGLE-2
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
19 papers critique it · 9 beat it on benchmarks
5
Lookahead· Lookahead
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
8 papers critique it · 13 beat it on benchmarks
6
PLD· Lookahead
PLD+: Accelerating LLM inference by leveraging Language Model Artifacts
7 papers critique it · 13 beat it on benchmarks
7
REST· Lookahead
REST: Retrieval-Based Speculative Decoding
6 papers critique it · 9 beat it on benchmarks
8
SpecInfer· SpecInfer
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
10 papers critique it · 4 beat it on benchmarks
9
Speculative Sampling· Lookahead
Speculative Sampling for Parametric Temporal Point Processes
2 papers critique it · 10 beat it on benchmarks
10
DFlash· EAGLE-3
DFlash: Block Diffusion for Flash Speculative Decoding
4 papers critique it · 6 beat it on benchmarks
11
LayerSkip· SpecInfer
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
7 papers critique it · 3 beat it on benchmarks
12
FR-Spec· FR-Spec
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling
4 papers critique it · 5 beat it on benchmarks

Sub-problems

Methods that compete on the same benchmarks cluster into distinct sub-problems.

EAGLE-2 · 54 methods

EAGLE-2 · EAGLE · Medusa · HASS · Hydra · SpecVLM

Lookahead · 47 methods

Lookahead · PLD · REST · Speculative Sampling · Token Recycling · PEARL

EAGLE-3 · 37 methods

EAGLE-3 · DFlash · DDD · DART · PARD · ParallelSpec

SpecInfer · 48 methods

SpecInfer · LayerSkip · SWIFT · SpecTr · Sequoia · AdaEDL

FR-Spec · 9 methods

FR-Spec · DynaSpec · SpecVocab · VocabTrim · CORAL · EvoSpec

SVIP · 9 methods

SVIP · AdaServe · DISCO · Dynamic Depth Decoding · SmartSpec · SpecServe

SSD · 8 methods

SSD · LLaDA · FastDLLM · improved sampling procedures · ReDi · FeF-DLLM

RSD · 8 methods

RSD · EARS · EASD · trained evaluation models in speculative decoding · From Tokens to Steps · Entropy-Aware Speculative Decoding (EASD)

EdgeLLM · 8 methods

EdgeLLM · SLED · DSD · DSSD · PicoSpec · HSL

SpecReason · 7 methods

SpecReason · LLM-as-a-judge for sequence-level verification · token-level speculative decoding · SpecThinking · SpecSampling · Lookahead Reasoning

The frontier

Recent methods not yet superseded in the knowledge base.