Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation

Xingyao Li, Fengzhuo Zhang, Cunxiao Du, Hui Ji

arXiv:2601.09212v11.5h-index: 5

Originality Incremental advance

AI Analysis

This work addresses the inference speed bottleneck for users of autoregressive image generation models, offering a theoretically grounded method that is incremental over prior relaxed speculative decoding approaches.

The paper tackles the slow inference problem in autoregressive image generation by proposing COOL-SD, an annealed relaxation of speculative decoding, which generates images faster with comparable quality or achieves better quality at similar latency, as validated by experiments showing consistent improvements in speed-quality trade-offs.

Despite significant progress in autoregressive image generation, inference remains slow due to the sequential nature of AR models and the ambiguity of image tokens, even when using speculative decoding. Recent works attempt to address this with relaxed speculative decoding but lack theoretical grounding. In this paper, we establish the theoretical basis of relaxed SD and propose COOL-SD, an annealed relaxation of speculative decoding built on two key insights. The first analyzes the total variation (TV) distance between the target model and relaxed speculative decoding and yields an optimal resampling distribution that minimizes an upper bound of the distance. The second uses perturbation analysis to reveal an annealing behaviour in relaxed speculative decoding, motivating our annealed design. Together, these insights enable COOL-SD to generate images faster with comparable quality, or achieve better quality at similar latency. Experiments validate the effectiveness of COOL-SD, showing consistent improvements over prior methods in speed-quality trade-offs.

View on arXiv PDF

Similar