CL LGAug 19, 2025

DPad: Efficient Diffusion Language Models with Suffix Dropout

Xinhua Chen, Sitao Huang, Cong Guo, Chiyue Wei, Yintao He, Jianyi Zhang, Hai "Helen" Li, Yiran Chen

arXiv:2508.14148v225.431 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses efficiency issues for users of diffusion language models in long-sequence inference, offering a significant speed improvement with minimal accuracy loss.

The paper tackled the high computational overhead in diffusion-based large language models by proposing DPad, a training-free method that restricts attention to nearby suffix tokens, achieving up to 61.4x speedup while maintaining comparable accuracy.

Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method that restricts attention to a small set of nearby suffix tokens, preserving fidelity while eliminating redundancy. DPad integrates two strategies: (i) a sliding window, which maintains a fixed-length suffix window, and (ii) distance-decay dropout, which deterministically removes distant suffix tokens before attention computation. This simple design is compatible with existing optimizations such as prefix caching and can be implemented with only a few lines of code. Comprehensive evaluations across multiple benchmarks on LLaDA-1.5 and Dream models demonstrate that DPad delivers up to $\mathbf{61.4\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference. Our code is available at https://github.com/Crys-Chen/DPad.

View on arXiv PDF Code

Similar