Method Drift›Speculative decoding
Superseded baseline#26 of 151 most-superseded
Sequoia
Sequoia: Scalable, Robust, and Hardware-aware Speculative DecodingSpeculative decoding · first seen Feb 19, 2024
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Sequoia as a baseline.
“methods that scale better with more draft tokens rely on static tree structures that may not be optimal for every setting, as they require tree structure optimization for every change in the text domain, generation parameters, and the hardware setup.”
— SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices“However, fixed patterns usually struggle to generalize to diverse query distributions, resulting in a relatively low acceptance rate as tree size grows.”
— DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure“Under Sequoia's positional acceptance assumption, the probability of accepting a token depends only on its rank among siblings---all sources share a single acceptance vector regardless of origin.”
— Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding“Tree-based verification (Sequoia, SpecExec) boosts acceptance but often increases compute”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 4, 2026
- component-aware self-speculative decodingComponent-Aware Self-Speculative Decoding in Hybrid Language ModelsMay 1, 2026
- Apr 22, 2026
- Apr 16, 2026
- Apr 2, 2026
- greedy multi-path block verification (GBV)Greedy Multi-Path Block Verification for Faster Decoding in Speculative SamplingFeb 18, 2026
- SDFPSDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM AccelerationFeb 5, 2026
- Feb 1, 2026
- CAS-Spec (Cascade Adaptive Self-Speculative Decoding)CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMsOct 30, 2025
- Oct 26, 2025
- Oct 17, 2025
- Oct 1, 2025