Method Drift›Speculative decoding
SpecVocab
Speculative decoding
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SpecVocab as a baseline.
“The latter can become a noticeable bottleneck on GPUs because it involves such operations as global ranking, partial sorting, irregular indexing and gathering a context-dependent subset of weights, which are less efficient than dense matrix multiplication.”
— SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating SpecVocab. Values are copied from the source paper's tables — verify against the cited paper.
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats SpecVocab · Speedup [Batch size = 1]
1.19 vs 1.16
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats SpecVocab · Speedup [Llama3.1-8B-Instruct, temperature 0, batch size 1]
2.94 vs 2.86
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats SpecVocab · Speedup [Llama3.1-8B-Instruct, temperature 0, batch size 64]
1.53 vs 1.46
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats SpecVocab · Speedup [Llama3.1-8B-Instruct, temperature 1, batch size 1]
2.40 vs 2.36
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 11, 2026
- EvoSpecEvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTargetApr 17, 2026
- Apr 8, 2026