Method Drift›Speculative decoding
VocabTrim
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMsSpeculative decoding · first seen Jun 28, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites VocabTrim as a baseline.
“All tokens outside this vocabulary are assigned zero probability and can never be proposed by the drafter, which typically reduces acceptance quality.”
— SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating VocabTrim. Values are copied from the source paper's tables — verify against the cited paper.
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats VocabTrim · Speedup [Batch size = 1]
1.19 vs 1.08
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats VocabTrim · Speedup [Llama3.1-8B-Instruct, temperature 0, batch size 1]
2.94 vs 2.71
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats VocabTrim · Speedup [Llama3.1-8B-Instruct, temperature 0, batch size 64]
1.53 vs 1.46
- SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
SlimSpec beats VocabTrim · Speedup [Llama3.1-8B-Instruct, temperature 1, batch size 1]
2.40 vs 2.11
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 11, 2026
- EvoSpecEvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTargetApr 17, 2026
- Apr 8, 2026