PLD (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 7 paper(s) critique it, 13 beat it on benchmarks — #6 of 151 most-superseded. Sub-problem: cluster led by Lookahead. Newer alternatives in the same sub-problem include Collaborative Speculative Decoding (CoSpec), ToolSpec, Quasar, FLy (Training-Free Loosely Speculative Decoding), Pivot-Aware Speculative Decoding.

Superseded baseline#6 of 151 most-superseded

PLD

PLD+: Accelerating LLM inference by leveraging Language Model Artifacts

Speculative decoding · first seen Dec 2, 2024

superseded — cited as a baseline and beaten by newer methods

7 papers critique it · 13 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PLD as a baseline.

“existing model-free approaches, such as prompt-lookup decoding (PLD)~saxena2023prompt, achieve low overhead and rapid token generation, but typically lack adaptivity.”
— SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
“there are no matched tokens in more than 30% of decoding steps in PLD”
— LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
“These training-free approaches are highly effective for tasks with high repetition (e.g., code editing) but struggle with open-ended generation where context reuse is minimal.”
— Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification
“pattern-repeating scenarios such as code generation, but can only propose a single continuation at a time. Moreover, because pattern matches are sparse and fail to capture the full diversity of target model outputs, PLD is constrained to specific domains and cannot generalize broadly.”
— RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
“Prompt- and retrieval-based approaches (PLD, Lookahead, CLLMs) improve draft quality but degrade with scarce context”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
“PLD~pld-saxena-2023 focuses on current text while REST~rest-he-2024 uses a text corpus.”
— SAM Decoding: Speculative Decoding via Suffix Automaton
“However, it cannot predict new tokens or their combinations.”
— RASD: Retrieval-Augmented Speculative Decoding

Beaten on benchmarks

Head-to-head results where a newer method reports beating PLD. Values are copied from the source paper's tables — verify against the cited paper.

suffix (tree) beats PLD · mean accepted tokens per step [AgenticSQL]
6.236 vs 2.373
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
suffix (tree) beats PLD · speedup over vanilla [AgenticSQL]
5.175 vs 2.105
SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications
DART beats PLD · Speedup [L2 7B Temperature=0]
2.85 vs 1.74
DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
LogitSpec beats PLD · MAT [Llama 2 7B]
2.41 vs 1.89
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 7B]
2.02 vs 1.73
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Llama 2 13B]
2.43 vs 1.89
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 13B]
2.03 vs 1.52
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Llama 2 70B]
2.67 vs 1.98
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Llama 2 70B]
2.10 vs 1.74
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Vicuna 7B]
3.28 vs 2.61
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · Speedup [Vicuna 7B]
2.61 vs 2.26
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats PLD · MAT [Vicuna 13B]
2.90 vs 2.34
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.