dVoting: Fast Voting for dLLMs
This work addresses the inefficiency in parallel test-time scaling for dLLMs, offering a method to boost reasoning for AI researchers and practitioners, though it is incremental as it builds on existing dLLM capabilities.
The paper tackles the problem of improving reasoning capability in diffusion large language models (dLLMs) without training, by introducing dVoting, a fast voting technique that leverages parallel token generation to refine uncertain tokens iteratively, resulting in performance gains of up to 14.84% on benchmarks like GSM8K and ARC-C.
Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting