Method Drift›Speculative decoding
Superseded baseline#136 of 151 most-superseded
SmartSpec
Speculative decoding
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SmartSpec as a baseline.
“These SLO-oriented speculation techniques have two key problems: (i) they are designed for non-latency critical scenario of batch sizes that make decoding closer to compute intensive "knee" of the GPU, and (ii) they employ analytical modeling to predict model execution time, as they cater to dense models. Single-batch MoE serving is highly memory bound, rendering OI-centric heuristics uneffective. Moreover, analytically modeling MoE execution time would not work, as the verification time varies depending from request-to-request and even across iterations.”
— Utility-Driven Speculative Decoding for Mixture-of-Experts
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Nov 3, 2025