CLAIApr 4

Differences in Text Generated by Diffusion and Autoregressive Language Models

arXiv:2605.1252273.0
AI Analysis

For researchers developing text generation models, this work identifies the distinct contributions of training objectives and decoding algorithms to differences between DLMs and ARMs, informing future DLM design.

Diffusion language models (DLMs) produce text with lower n-gram entropy, higher semantic coherence, and higher semantic diversity compared to autoregressive models (ARMs). Controlled experiments show that the DLM training objective drives coherence and diversity gains, while entropy reduction stems from decoding algorithms like confidence-based remasking.

Diffusion language models (DLMs) are promising alternatives to autoregressive language models (ARMs), yet the intrinsic differences in their generated text remain underexplored. We first find empirically that off-the-shelf DLMs exhibit lower $n$-gram entropy, higher semantic coherence, and higher semantic diversity. To understand the cause, we conduct controlled experiments that decouple the effects of training objectives and decoding algorithms. Results suggest that the DLM training objective contributes to the increases in semantic coherence and semantic diversity, but has a minor influence on entropy. These differences are primarily driven by the bidirectional context; other components in the training objective, such as input masking, label masking, and the weighting function, have a much weaker influence. Further, our experiments demonstrate that the reduction in entropy stems from DLMs' decoding algorithms, particularly confidence-based remasking strategies. We provide a theoretical understanding for this entropy reduction phenomenon. Together, our work uncovers key mechanisms underlying the differences between DLMs and ARMs in text generation, and informs future design of training objectives and decoding algorithms in DLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes