AIJan 9

Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Yuxuan Zhou, Fei Huang, Heng Li, Fengyi Wu, Tianyu Wang, Jianwei Zhang, Junyang Lin, Zhi-Qi Cheng

arXiv:2601.05724v21 citationsh-index: 4Has Code

Originality Highly original

AI Analysis

This addresses a key problem in improving inference speed for large language models without compromising distribution fidelity, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the bottleneck of verification in speculative decoding by proposing Hierarchical Speculative Decoding (HSD), a lossless method that overcomes joint intractability, resulting in a 12% performance gain when integrated into EAGLE-3.

Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.

View on arXiv PDF Code

Similar