Method Drift›Speculative decoding
EAGLE-3
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time TestSpeculative decoding · first seen Mar 3, 2025
heavily superseded — a standard baseline that newer methods routinely beat
13 papers critique it · 28 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites EAGLE-3 as a baseline.
“The method that relies on training an auxiliary draft model demonstrate promising acceleration potential, but its inference behavior is inherently coupled to the training process, limiting adaptability across diverse reasoning scenarios.”
— Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling“State-of-the-art tree-based methods, such as eagle, construct the draft tree via a rigid layer-wise expansion mechanism.”
— TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees“Conversely, we also observe significant instability, with performance dropping to a 0.70× slowdown in the worst-case scenario (Eagle-3 of L31-8B on SWE-bench).”
— An Empirical Study of Speculative Decoding on Software Engineering Tasks“current autoregressive drafter designs introduce limitations. While approaches such as EAGLE3 significantly lower the per-step drafting cost by using a single customized layer, the drafting process is still inherently autoregressive. This sequential dependency forces the drafter to spend nearly 20\%--40\% of the total inference time, thereby fundamentally limiting the achievable acceleration, causing the drafting stage, especially the drafting forward cost, to emerge as a new bottleneck in speculative decoding.”
— DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference“However, these approaches primarily focus on optimizing draft generation while the verification phase — which constitutes 67-90% of total computation — receives limited attention for fine-grained adaptation.”
— HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding“It is plausible that EAGLE-3, trained primarily on English corpora, fails to simulate the target model's distribution accurately in Chinese reasoning tasks. Such data distribution mismatches, rooted in differences in post-training procedures or training corpora, generally constrain the robustness of model-based SD methods.”
— RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding“each added tree depth still costs one more sequential drafter round”
— SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting“Training-based method EAGLE-3 suffers significant degradation under OOD conditions.”
— Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match“existing draft models, including the EAGLE series, have a shared drawback: they generate draft tokens by conditioning solely on the current prefix. This design is prone to error accumulation.”
— ConFu: Contemplate the Future for Better Speculative Sampling“we find that the high acceptance length of EAGLE3 is not viable when the context length is beyond the trained context window”
— OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs“state-of-the-art methods like EAGLE-3 still rely on autoregressive drafting. This serial drafting process is not only inherently inefficient but also susceptible to error accumulation, which effectively caps achievable speedups at approximately 2-3×”
— DFlash: Block Diffusion for Flash Speculative Decoding“However, these methods are typically evaluated at 2K context with full-cache drafting~liu2026illusion, and do not address the sparse/full mismatch when a sparse KV cache constrains the drafter in longer contexts~yang2025longspec.”
— BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding
Beaten on benchmarks
Head-to-head results where a newer method reports beating EAGLE-3. Values are copied from the source paper's tables — verify against the cited paper.
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-4B, GSM8K]
1256 vs 453
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-4B, GSM8K]
2202 vs 832
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=8 [Qwen3-4B, GSM8K]
3441 vs 1375
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=16 [Qwen3-4B, GSM8K]
4467 vs 1839
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=32 [Qwen3-4B, GSM8K]
5509 vs 2170
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-4B, MBPP]
968 vs 407
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-4B, MBPP]
1654 vs 740
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=8 [Qwen3-4B, MBPP]
2651 vs 1186
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=16 [Qwen3-4B, MBPP]
3422 vs 1597
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=32 [Qwen3-4B, MBPP]
4290 vs 1892
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=2 [Qwen3-8B, GSM8K]
942 vs 324
- Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding
Domino beats EAGLE-3 · TPS at concurrency=4 [Qwen3-8B, GSM8K]
1703 vs 598
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 3, 2026
- Jun 2, 2026
- Hybrid Verified DecodingHybrid Verified Decoding: Learning to Allocate Verification in Speculative DecodingMay 31, 2026
- May 28, 2026
- May 28, 2026
- May 28, 2026
- May 19, 2026
- May 19, 2026
- May 9, 2026
- May 8, 2026
- May 1, 2026
- Apr 21, 2026