Method Drift›Speculative decoding
Superseded baseline#139 of 151 most-superseded
SpecServe
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative DecodingSpeculative decoding · first seen Mar 7, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SpecServe as a baseline.
“These SLO-oriented speculation techniques have two key problems: (i) they are designed for non-latency critical scenario of batch sizes that make decoding closer to compute intensive "knee" of the GPU, and (ii) they employ analytical modeling to predict model execution time, as they cater to dense models. Single-batch MoE serving is highly memory bound, rendering OI-centric heuristics uneffective. Moreover, analytically modeling MoE execution time would not work, as the verification time varies depending from request-to-request and even across iterations.”
— Utility-Driven Speculative Decoding for Mixture-of-Experts
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Nov 3, 2025