Is EAGLE-3 superseded?

EAGLE-3 (Speculative decoding): heavily superseded — a standard baseline that newer methods routinely beat. 13 paper(s) critique it, 28 beat it on benchmarks — #1 of 151 most-superseded. Sub-problem: cluster led by EAGLE-3. Newer alternatives in the same sub-problem include D^2SD, TreeFlash, Hybrid Verified Decoding, Bastion, Draft-OPD.

Method Drift›Speculative decoding

Heavily superseded#1 of 151 most-superseded

EAGLE-3

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Speculative decoding · first seen Mar 3, 2025

heavily superseded — a standard baseline that newer methods routinely beat

13 papers critique it · 28 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites EAGLE-3 as a baseline.

“The method that relies on training an auxiliary draft model demonstrate promising acceleration potential, but its inference behavior is inherently coupled to the training process, limiting adaptability across diverse reasoning scenarios.”
— Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
“State-of-the-art tree-based methods, such as eagle, construct the draft tree via a rigid layer-wise expansion mechanism.”
— TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees
“Conversely, we also observe significant instability, with performance dropping to a 0.70× slowdown in the worst-case scenario (Eagle-3 of L31-8B on SWE-bench).”
— An Empirical Study of Speculative Decoding on Software Engineering Tasks
“current autoregressive drafter designs introduce limitations. While approaches such as EAGLE3 significantly lower the per-step drafting cost by using a single customized layer, the drafting process is still inherently autoregressive. This sequential dependency forces the drafter to spend nearly 20\%--40\% of the total inference time, thereby fundamentally limiting the achievable acceleration, causing the drafting stage, especially the drafting forward cost, to emerge as a new bottleneck in speculative decoding.”
— DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
“However, these approaches primarily focus on optimizing draft generation while the verification phase — which constitutes 67-90% of total computation — receives limited attention for fine-grained adaptation.”
— HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding
“It is plausible that EAGLE-3, trained primarily on English corpora, fails to simulate the target model's distribution accurately in Chinese reasoning tasks. Such data distribution mismatches, rooted in differences in post-training procedures or training corpora, generally constrain the robustness of model-based SD methods.”
— RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
“each added tree depth still costs one more sequential drafter round”
— SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
“Training-based method EAGLE-3 suffers significant degradation under OOD conditions.”
— Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
“existing draft models, including the EAGLE series, have a shared drawback: they generate draft tokens by conditioning solely on the current prefix. This design is prone to error accumulation.”
— ConFu: Contemplate the Future for Better Speculative Sampling
“we find that the high acceptance length of EAGLE3 is not viable when the context length is beyond the trained context window”
— OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs
“state-of-the-art methods like EAGLE-3 still rely on autoregressive drafting. This serial drafting process is not only inherently inefficient but also susceptible to error accumulation, which effectively caps achievable speedups at approximately 2-3×”
— DFlash: Block Diffusion for Flash Speculative Decoding
“However, these methods are typically evaluated at 2K context with full-cache drafting~liu2026illusion, and do not address the sparse/full mismatch when a sparse KV cache constrains the drafter in longer contexts~yang2025longspec.”
— BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Beaten on benchmarks

Head-to-head results where a newer method reports beating EAGLE-3. Values are copied from the source paper's tables — verify against the cited paper.

Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-4B, GSM8K]
1256 vs 453
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-4B, GSM8K]
2202 vs 832
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=8 [Qwen3-4B, GSM8K]
3441 vs 1375
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=16 [Qwen3-4B, GSM8K]
4467 vs 1839
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=32 [Qwen3-4B, GSM8K]
5509 vs 2170
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-4B, MBPP]
968 vs 407
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-4B, MBPP]
1654 vs 740
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=8 [Qwen3-4B, MBPP]
2651 vs 1186
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=16 [Qwen3-4B, MBPP]
3422 vs 1597
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=32 [Qwen3-4B, MBPP]
4290 vs 1892
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-8B, GSM8K]
942 vs 324
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-8B, GSM8K]
1703 vs 598
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.