Method Drift›Speculative decoding
SWIFT
Speculative decoding
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SWIFT as a baseline.
“While SWIFT only allows a fixed skipping rate, Conflayers does not limit the search space to a certain number of layers to skip and expands the exploration set to any number of layers below or above a pre-defined threshold while conditioning the search on the performance of the draft model.”
— ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding“requires substantial workload-specific tuning to isolate layers and operations that can be skipped while maintaining high token acceptance rates”
— HiSpec: Hierarchical Speculative Decoding for LLMs“The most closely related framework to ours is SWIFT (Xia et al., 2025), which adaptively selects subsets of layers to skip during inference under a speculative decoding paradigm. By treating the same LLM as both draft and verifier via dynamic layer selection, SWIFT achieves lossless acceleration without introducing new modules or supervision. However, SWIFT still requires iterative Bayesian optimization to identify the optimal layer subsets, which can be computationally expensive prior to deployment.”
— SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration“All existing self-speculative methods share a common assumption: the model is a homogeneous stack of similar layers, and the drafting strategy consists of skipping or shortcutting some of these layers. This assumption breaks down in hybrid architectures, where layers contain fundamentally different computational components.”
— Component-Aware Self-Speculative Decoding in Hybrid Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating SWIFT. Values are copied from the source paper's tables — verify against the cited paper.
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-2-13B CNN-DM]
1.16 vs 0.92
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-2-70B CNN-DM]
1.37 vs 1.30
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-3-8B CNN-DM]
1.10 vs 1.08
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [LLaMa-3-70B CNN-DM]
1.38 vs 1.26
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [Average LLaMa-2-70B]
1.35 vs 1.24
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [CodeLLaMa-34B HumanEval]
1.24 vs 1.06
- ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding
ConfLayers beats SWIFT · Speedup [Qwen2.5-Math-72B GSM8K]
1.22 vs 1.15
- CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
CAS-Spec beats SWIFT · Overall [7B model]
1.578 vs 1.064
- CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
CAS-Spec beats SWIFT · Overall [13B model]
1.524 vs 1.119
- CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMs
CAS-Spec beats SWIFT · Overall [33B model]
1.481 vs 1.206
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 4, 2026
- component-aware self-speculative decodingComponent-Aware Self-Speculative Decoding in Hybrid Language ModelsMay 1, 2026
- Apr 22, 2026
- Apr 16, 2026
- Apr 2, 2026
- greedy multi-path block verification (GBV)Greedy Multi-Path Block Verification for Faster Decoding in Speculative SamplingFeb 18, 2026
- SDFPSDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM AccelerationFeb 5, 2026
- Feb 1, 2026
- CAS-Spec (Cascade Adaptive Self-Speculative Decoding)CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMsOct 30, 2025
- Oct 26, 2025
- Oct 17, 2025
- Oct 1, 2025