SWIFT (Speculative decoding): superseded — cited as a baseline and beaten by newer methods. 4 paper(s) critique it, 2 beat it on benchmarks — #15 of 151 most-superseded. Sub-problem: cluster led by SpecInfer. Newer alternatives in the same sub-problem include SpecKV, component-aware self-speculative decoding, FASER, ConfLayers, Goose.

Is SWIFT superseded? Critiques, benchmarks & alternatives

What papers say

Verbatim critique sentences, each from a paper that cites SWIFT as a baseline.

“While SWIFT only allows a fixed skipping rate, Conflayers does not limit the search space to a certain number of layers to skip and expands the exploration set to any number of layers below or above a pre-defined threshold while conditioning the search on the performance of the draft model.”
— ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
“requires substantial workload-specific tuning to isolate layers and operations that can be skipped while maintaining high token acceptance rates”
— HiSpec: Hierarchical Speculative Decoding for LLMs
“The most closely related framework to ours is SWIFT (Xia et al., 2025), which adaptively selects subsets of layers to skip during inference under a speculative decoding paradigm. By treating the same LLM as both draft and verifier via dynamic layer selection, SWIFT achieves lossless acceleration without introducing new modules or supervision. However, SWIFT still requires iterative Bayesian optimization to identify the optimal layer subsets, which can be computationally expensive prior to deployment.”
— SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration
“All existing self-speculative methods share a common assumption: the model is a homogeneous stack of similar layers, and the drafting strategy consists of skipping or shortcutting some of these layers. This assumption breaks down in hybrid architectures, where layers contain fundamentally different computational components.”
— Component-Aware Self-Speculative Decoding in Hybrid Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating SWIFT. Values are copied from the source paper's tables — verify against the cited paper.

ConfLayers beats SWIFT · Speedup [LLaMa-2-13B CNN-DM]
1.16 vs 0.92
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-2-70B CNN-DM]
1.37 vs 1.30
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-3-8B CNN-DM]
1.10 vs 1.08
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-3-70B CNN-DM]
1.38 vs 1.26
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [Average LLaMa-2-70B]
1.35 vs 1.24
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [CodeLLaMa-34B HumanEval]
1.24 vs 1.06
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [Qwen2.5-Math-72B GSM8K]
1.22 vs 1.15
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
CAS-Spec beats SWIFT · Overall [7B model]
1.578 vs 1.064
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
CAS-Spec beats SWIFT · Overall [13B model]
1.524 vs 1.119
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
CAS-Spec beats SWIFT · Overall [33B model]
1.481 vs 1.206
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.