Method Drift›Speculative decoding
REST
REST: Retrieval-Based Speculative DecodingSpeculative decoding · first seen Nov 14, 2023
superseded — cited as a baseline and beaten by newer methods
6 papers critique it · 9 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites REST as a baseline.
“LLM-A~yang2023LLMA and ReST~he2023rest generate drafts from reference texts, potentially reducing latency, but face database limitations, distribution gaps, and reliance on greedy decoding.”
— Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding“These training-free approaches are highly effective for tasks with high repetition (e.g., code editing) but struggle with open-ended generation where context reuse is minimal.”
— Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification“Although REST can achieve a high draft token acceptance rate, the static nature of the datastore introduces a new challenge regarding storage space. REST stores the entire text of a pre-training dataset as-is, and the way to improve the accuracy of REST is to simply append more text to the datastore. However, this grows the size of the datastore unboundedly, motivating the need for a method to 'compact' a datastore.”
— CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding“REST and Ouroboros reuse outputs or databases but depend on resource quality”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding“REST uses suffixed arrays, which provides better complexity than PLD, but still not optimal complexity.”
— SAM Decoding: Speculative Decoding via Suffix Automaton“While user-friendly, it is less effective compared to draft model-based methods.”
— RASD: Retrieval-Augmented Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating REST. Values are copied from the source paper's tables — verify against the cited paper.
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Llama 2 7B]
2.41 vs 1.71
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 7B]
2.02 vs 1.33
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Llama 2 13B]
2.43 vs 1.71
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 13B]
2.03 vs 1.34
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Llama 2 70B]
2.67 vs 1.67
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 70B]
2.10 vs 1.35
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 7B]
3.28 vs 1.64
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 7B]
2.61 vs 1.19
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 13B]
2.90 vs 1.65
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 13B]
2.17 vs 1.22
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 33B]
2.66 vs 1.65
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 33B]
2.01 vs 1.41
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Collaborative Speculative Decoding (CoSpec)Beyond the Target: From Imitation to Collaboration in Speculative DecodingMay 24, 2026
- ToolSpecToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative DecodingApr 15, 2026
- QuasarQuasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient VerificationMar 2, 2026
- FLy (Training-Free Loosely Speculative Decoding)Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact MatchNov 28, 2025
- Nov 1, 2025
- Oct 30, 2025
- Oct 22, 2025
- Oct 8, 2025
- Group Tree Optimization (GTO)Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative DecodingSep 26, 2025
- Sep 22, 2025