EAGLE (Speculative decoding): heavily superseded — a standard baseline that newer methods routinely beat. 19 paper(s) critique it, 12 beat it on benchmarks — #3 of 151 most-superseded. Sub-problem: cluster led by EAGLE-2. Newer alternatives in the same sub-problem include DREAM-S, PPOW, SpecForge, OnlineSpec, MoE-Spec.

Heavily superseded#3 of 151 most-superseded

EAGLE

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Speculative decoding · first seen Jan 26, 2024

heavily superseded — a standard baseline that newer methods routinely beat

19 papers critique it · 12 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites EAGLE as a baseline.

“there exists a conflict between the feature-level and the logit-level losses. The feature-level loss is introduced to facilitate knowledge distillation. However, to the best of our knowledge, there is currently no research indicating that the knowledge of an LLM can be distilled into a single transformer decoder layer. We believe that performing strict knowledge distillation between the target LLM and a lightweight draft model is unrealistic.”
— Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
“these methods still necessitate a computationally expensive projection over the full vocabulary at the draft head's final layer, which persists as a major latency source”
— EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget
“the top-$k$ tokens within the same layer of the draft tree remain coupled, which limits the diversity and potential of the predictions.”
— Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
“The shape of EAGLE's draft tree is fixed, with the drafting phase filling in the corresponding positions. EAGLE-2 aims to improve this by introducing a dynamically adjustable draft tree.”
— EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
“Tree-attention frameworks—SpecInfer~miao2024specinfer, Medusa~cai2024medusa, and Eagle~li2024eagle, fan2026flatter—expand many branches, quickly exhausting memory.”
— Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
“its structure is static with no dependence on the draft model output”
— Dynamic Depth Decoding: Faster Speculative Decoding for LLMs
“Due to the sampling results at the token layer being hidden, feature-level autoregression introduces uncertainty.”
— EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
“For MoE models, however, this assumption breaks down; we analyze this problem in sec:method:problem.”
— MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
“only EAGLE~li2024eagle presents results for batch sizes $ 4$ but doesn't discuss larger batch sizes.”
— EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
“Prevailing methods, medusa, li2024eagle, vicuna68m use small drafters simply trained on datasets such as ShareGPT sharegpt which is often used for instruction tuning of LLMs to learn a pattern of target LLM's language modeling. However, our investigations reveal that such approaches are insufficient for multilingual translation.”
— Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
“target-dependent methods such as the EAGLE series require separate training for each individual model”
— PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
“methods like Eagle~li2024eagle, which use an autoregressive approach, are more suitable for modeling Semantic Coherence. Using such methods to model Syntactic Coherence introduces unnecessary computational overhead.”
— S$^4$C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating EAGLE. Values are copied from the source paper's tables — verify against the cited paper.

EAGLE-2 beats EAGLE · Average Acceptance Length [V 13B, Temperature=0]
4.83 vs 3.98
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
SpecForge beats EAGLE · Throughput [Llama3.1-8B, 4096 seq length]
126639.6 vs 63015.4
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Python]
7.69 vs 6.48
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Python]
99.58 vs 87.37
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Math]
7.69 vs 5.88
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Math]
98.63 vs 76.35
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Biology]
7.18 vs 5.95
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Biology]
93.78 vs 71.20
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT Chemistry]
7.10 vs 5.28
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT Chemistry]
89.65 vs 71.28
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · MAT [LLaMA-3.1-8B-IT MedQA]
6.47 vs 4.96
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
HedgeSpec beats EAGLE · Token/s [LLaMA-3.1-8B-IT MedQA]
77.26 vs 66.48
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.