Method Drift›Speculative decoding
FR-Spec
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative SamplingSpeculative decoding · first seen Feb 20, 2025
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 5 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites FR-Spec as a baseline.
“static methods rely on the assumption of static word frequencies and fail to capture long-tail tokens that become locally probable in specialized domains or during topic shifts”
— EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget“While effective in reducing latency, these static subsets may suppress rare or domain-specific tokens, lowering acceptance in speculative decoding.”
— DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models“are context-insensitive and struggle with long-tail tokens, leading to lower acceptance rates in diverse scenarios.”
— MicroSpec: Accelerating Speculative Decoding with Lightweight In-Context Vocabularies“All tokens outside this vocabulary are assigned zero probability and can never be proposed by the drafter, which typically reduces acceptance quality.”
— SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating FR-Spec. Values are copied from the source paper's tables — verify against the cited paper.
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=1 [GSM8K, Concurrency]
3.01 vs 2.77
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=2 [GSM8K, Concurrency]
2.70 vs 2.52
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=4 [GSM8K, Concurrency]
2.61 vs 2.36
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=8 [GSM8K, Concurrency]
2.18 vs 2.09
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=16 [GSM8K, Concurrency]
1.68 vs 1.53
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=32 [GSM8K, Concurrency]
1.24 vs 1.16
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=1 [HumanEval, Concurrency]
2.82 vs 2.67
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=2 [HumanEval, Concurrency]
2.64 vs 2.43
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=4 [HumanEval, Concurrency]
2.52 vs 2.30
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=8 [HumanEval, Concurrency]
2.23 vs 2.03
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=16 [HumanEval, Concurrency]
1.63 vs 1.50
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats FR-Spec · Speedup at concurrency=32 [HumanEval, Concurrency]
1.12 vs 1.02
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 11, 2026
- EvoSpecEvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTargetApr 17, 2026
- Apr 8, 2026