Method Drift›Speculative decoding
PLD
PLD+: Accelerating LLM inference by leveraging Language Model ArtifactsSpeculative decoding · first seen Dec 2, 2024
superseded — cited as a baseline and beaten by newer methods
7 papers critique it · 13 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PLD as a baseline.
“existing model-free approaches, such as prompt-lookup decoding (PLD)~saxena2023prompt, achieve low overhead and rapid token generation, but typically lack adaptivity.”
— SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications“there are no matched tokens in more than 30% of decoding steps in PLD”
— LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation“These training-free approaches are highly effective for tasks with high repetition (e.g., code editing) but struggle with open-ended generation where context reuse is minimal.”
— Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification“pattern-repeating scenarios such as code generation, but can only propose a single continuation at a time. Moreover, because pattern matches are sparse and fail to capture the full diversity of target model outputs, PLD is constrained to specific domains and cannot generalize broadly.”
— RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding“Prompt- and retrieval-based approaches (PLD, Lookahead, CLLMs) improve draft quality but degrade with scarce context”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding“PLD~pld-saxena-2023 focuses on current text while REST~rest-he-2024 uses a text corpus.”
— SAM Decoding: Speculative Decoding via Suffix Automaton“However, it cannot predict new tokens or their combinations.”
— RASD: Retrieval-Augmented Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating PLD. Values are copied from the source paper's tables — verify against the cited paper.
- SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
suffix (tree) beats PLD · mean accepted tokens per step [AgenticSQL]
6.236 vs 2.373
- SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
suffix (tree) beats PLD · speedup over vanilla [AgenticSQL]
5.175 vs 2.105
- DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
DART beats PLD · Speedup [L2 7B Temperature=0]
2.85 vs 1.74
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Llama 2 7B]
2.41 vs 1.89
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 7B]
2.02 vs 1.73
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Llama 2 13B]
2.43 vs 1.89
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 13B]
2.03 vs 1.52
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Llama 2 70B]
2.67 vs 1.98
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 70B]
2.10 vs 1.74
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Vicuna 7B]
3.28 vs 2.61
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Vicuna 7B]
2.61 vs 2.26
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Vicuna 13B]
2.90 vs 2.34
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Collaborative Speculative Decoding (CoSpec)Beyond the Target: From Imitation to Collaboration in Speculative DecodingMay 24, 2026
- ToolSpecToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative DecodingApr 15, 2026
- QuasarQuasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient VerificationMar 2, 2026
- FLy (Training-Free Loosely Speculative Decoding)Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact MatchNov 28, 2025
- Nov 1, 2025
- Oct 30, 2025
- Oct 22, 2025
- Oct 8, 2025
- Group Tree Optimization (GTO)Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative DecodingSep 26, 2025
- Sep 22, 2025