Method Drift›Speculative decoding
EAGLE
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintySpeculative decoding · first seen Jan 26, 2024
heavily superseded — a standard baseline that newer methods routinely beat
19 papers critique it · 12 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites EAGLE as a baseline.
“there exists a conflict between the feature-level and the logit-level losses. The feature-level loss is introduced to facilitate knowledge distillation. However, to the best of our knowledge, there is currently no research indicating that the knowledge of an LLM can be distilled into a single transformer decoder layer. We believe that performing strict knowledge distillation between the target LLM and a lightweight draft model is unrealistic.”
— Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation“these methods still necessitate a computationally expensive projection over the full vocabulary at the draft head's final layer, which persists as a major latency source”
— EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget“the top-$k$ tokens within the same layer of the draft tree remain coupled, which limits the diversity and potential of the predictions.”
— Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE“The shape of EAGLE's draft tree is fixed, with the drafting phase filling in the corresponding positions. EAGLE-2 aims to improve this by introducing a dynamically adjustable draft tree.”
— EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees“Tree-attention frameworks—SpecInfer~miao2024specinfer, Medusa~cai2024medusa, and Eagle~li2024eagle, fan2026flatter—expand many branches, quickly exhausting memory.”
— Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding“its structure is static with no dependence on the draft model output”
— Dynamic Depth Decoding: Faster Speculative Decoding for LLMs“Due to the sampling results at the token layer being hidden, feature-level autoregression introduces uncertainty.”
— EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test“For MoE models, however, this assumption breaks down; we analyze this problem in sec:method:problem.”
— MoE-Spec: Expert Budgeting for Efficient Speculative Decoding“only EAGLE~li2024eagle presents results for batch sizes $ 4$ but doesn't discuss larger batch sizes.”
— EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models“Prevailing methods, medusa, li2024eagle, vicuna68m use small drafters simply trained on datasets such as ShareGPT sharegpt which is often used for instruction tuning of LLMs to learn a pattern of target LLM's language modeling. However, our investigations reveal that such approaches are insufficient for multilingual translation.”
— Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters“target-dependent methods such as the EAGLE series require separate training for each individual model”
— PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation“methods like Eagle~li2024eagle, which use an autoregressive approach, are more suitable for modeling Semantic Coherence. Using such methods to model Syntactic Coherence introduces unnecessary computational overhead.”
— S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating EAGLE. Values are copied from the source paper's tables — verify against the cited paper.
- EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
EAGLE-2 beats EAGLE · Average Acceptance Length [V 13B, Temperature=0]
4.83 vs 3.98
- SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
SpecForge beats EAGLE · Throughput [Llama3.1-8B, 4096 seq length]
126639.6 vs 63015.4
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Python]
7.69 vs 6.48
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Python]
99.58 vs 87.37
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Math]
7.69 vs 5.88
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Math]
98.63 vs 76.35
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Biology]
7.18 vs 5.95
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Biology]
93.78 vs 71.20
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Chemistry]
7.10 vs 5.28
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Chemistry]
89.65 vs 71.28
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT MedQA]
6.47 vs 4.96
- Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT MedQA]
77.26 vs 66.48
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- DREAM-SDREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal GenerationMay 30, 2026
- May 14, 2026
- SpecForgeSpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative DecodingMar 19, 2026
- Mar 13, 2026
- Feb 17, 2026
- Oct 22, 2025
- Oct 22, 2025
- Oct 17, 2025
- Draft, Verify, & Improve (DVI)Draft, Verify, and Improve: Toward Training-Aware Speculative DecodingOct 6, 2025
- FastGRPOFastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft LearningSep 26, 2025