Method Drift›Speculative decoding
Lookahead
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation AccuracySpeculative decoding · first seen Dec 20, 2023
heavily superseded — a standard baseline that newer methods routinely beat
8 papers critique it · 13 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Lookahead as a baseline.
“This would significantly increase the in-flight tokens (by $G$) for MoEs, and our evaluations reveal that even vanilla n-gram decoding ($G=1$) incurs high costs.”
— Utility-Driven Speculative Decoding for Mixture-of-Experts“some speculative decoding approaches, such as Lookahead, rely solely on N-gram or retrieval-based heuristics for drafting. While such methods incur negligible drafting latency, their limited predictive accuracy typically leads to very low average acceptance length $$, resulting in modest end-to-end speedups.”
— DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference“These methods refine multiple tokens simultaneously using the model's internal attention mechanism, eliminating the need for explicit draft weights, though often yielding shorter acceptance lengths compared to model-based drafters.”
— Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification“this approach makes each decode step significantly more computationally expensive”
— HiSpec: Hierarchical Speculative Decoding for LLMs“While these methods substantially improve proposal quality, their objectives are typically defined at the token or local distribution level, leaving the window-level and prefix-sensitive nature of speculative verification less explicitly optimized.”
— Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing“Prompt- and retrieval-based approaches (PLD, Lookahead, CLLMs) improve draft quality but degrade with scarce context”
— Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding“However, these methods exhibit lower accuracy and greater resource use compared to our approach. They demand more memory and GPU processing power, posing challenges in resource-scarce settings.”
— Adaptive Draft-Verification for Efficient Large Language Model Decoding“However, due to its lower efficiency in generating draft tokens compared to Medusa, its end-to-end speedup ratio is slightly lower than that of Medusa”
— Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Beaten on benchmarks
Head-to-head results where a newer method reports beating Lookahead. Values are copied from the source paper's tables — verify against the cited paper.
- ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios
ECHO beats Lookahead · Avg. Speedup [Vicuna-13B]
5.25 vs 1.60
- DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
DART beats Lookahead · Speedup [L2 7B Temperature=0]
2.85 vs 1.61
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · MAT [Llama 2 7B]
2.41 vs 1.58
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · Speedup [Llama 2 7B]
2.02 vs 1.36
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · MAT [Llama 2 13B]
2.43 vs 1.56
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · Speedup [Llama 2 13B]
2.03 vs 1.18
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · MAT [Llama 2 70B]
2.67 vs 1.53
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · Speedup [Llama 2 70B]
2.10 vs 1.28
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · MAT [Vicuna 7B]
3.28 vs 1.54
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · Speedup [Vicuna 7B]
2.61 vs 1.28
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · MAT [Vicuna 13B]
2.90 vs 1.46
- LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation
LogitSpec beats Lookahead · Speedup [Vicuna 13B]
2.17 vs 1.12
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Collaborative Speculative Decoding (CoSpec)Beyond the Target: From Imitation to Collaboration in Speculative DecodingMay 24, 2026
- ToolSpecToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative DecodingApr 15, 2026
- QuasarQuasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient VerificationMar 2, 2026
- FLy (Training-Free Loosely Speculative Decoding)Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact MatchNov 28, 2025
- Nov 1, 2025
- Oct 30, 2025
- Oct 22, 2025
- Oct 8, 2025
- Group Tree Optimization (GTO)Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative DecodingSep 26, 2025
- Sep 22, 2025