REST (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 6 paper(s) critique it, 9 beat it on benchmarks — #7 of 151 most-superseded. Sub-problem: cluster led by Lookahead. Newer alternatives in the same sub-problem include Collaborative Speculative Decoding (CoSpec), ToolSpec, Quasar, FLy (Training-Free Loosely Speculative Decoding), Pivot-Aware Speculative Decoding.

Superseded baseline#7 of 151 most-superseded

REST

REST: Retrieval-Based Speculative Decoding

Speculative decoding · first seen Nov 14, 2023

superseded — cited as a baseline and beaten by newer methods

6 papers critique it · 9 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites REST as a baseline.

“LLM-A~yang2023LLMA and ReST~he2023rest generate drafts from reference texts, potentially reducing latency, but face database limitations, distribution gaps, and reliance on greedy decoding.”
— Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
“These training-free approaches are highly effective for tasks with high repetition (e.g., code editing) but struggle with open-ended generation where context reuse is minimal.”
— Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification
“Although REST can achieve a high draft token acceptance rate, the static nature of the datastore introduces a new challenge regarding storage space. REST stores the entire text of a pre-training dataset as-is, and the way to improve the accuracy of REST is to simply append more text to the datastore. However, this grows the size of the datastore unboundedly, motivating the need for a method to 'compact' a datastore.”
— CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding
“REST and Ouroboros reuse outputs or databases but depend on resource quality”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
“REST uses suffixed arrays, which provides better complexity than PLD, but still not optimal complexity.”
— SAM Decoding: Speculative Decoding via Suffix Automaton
“While user-friendly, it is less effective compared to draft model-based methods.”
— RASD: Retrieval-Augmented Speculative Decoding

Beaten on benchmarks

Head-to-head results where a newer method reports beating REST. Values are copied from the source paper's tables — verify against the cited paper.

LogitSpec beats REST · MAT [Llama 2 7B]
2.41 vs 1.71
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 7B]
2.02 vs 1.33
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Llama 2 13B]
2.43 vs 1.71
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 13B]
2.03 vs 1.34
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Llama 2 70B]
2.67 vs 1.67
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Llama 2 70B]
2.10 vs 1.35
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 7B]
3.28 vs 1.64
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 7B]
2.61 vs 1.19
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 13B]
2.90 vs 1.65
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 13B]
2.17 vs 1.22
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · MAT [Vicuna 33B]
2.66 vs 1.65
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats REST · Speedup [Vicuna 33B]
2.01 vs 1.41
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.