Is Sequoia superseded?

Sequoia (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 4 paper(s) critique it, 0 beat it on benchmarks — #26 of 151 most-superseded. Sub-problem: cluster led by SpecInfer. Newer alternatives in the same sub-problem include SpecKV, component-aware self-speculative decoding, FASER, ConfLayers, Goose.

Method Drift›Speculative decoding

Superseded baseline#26 of 151 most-superseded

Sequoia

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

Speculative decoding · first seen Feb 19, 2024

superseded — cited as a baseline and beaten by newer methods

4 papers critique it · 0 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Sequoia as a baseline.

“methods that scale better with more draft tokens rely on static tree structures that may not be optimal for every setting, as they require tree structure optimization for every change in the text domain, generation parameters, and the hardware setup.”
— SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
“However, fixed patterns usually struggle to generalize to diverse query distributions, resulting in a relatively low acceptance rate as tree size grows.”
— DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
“Under Sequoia's positional acceptance assumption, the probability of accepting a token depends only on its rank among siblings---all sources share a single acceptance vector regardless of origin.”
— Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding
“Tree-based verification (Sequoia, SpecExec) boosts acceptance but often increases compute”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.