LG AI CLFeb 13, 2025

Non-Markovian Discrete Diffusion with Causal Language Models

Yangtian Zhang, Sizhuang He, Daniel Levine, Lawrence Zhao, David Zhang, Syed A Rizvi, Shiyang Zhang, Emanuele Zappala, Rex Ying, David van Dijk

arXiv:2502.09767v314.45 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses the problem of limited expressive power in discrete diffusion models for natural language generation, which affects researchers and developers in the field of natural language processing.

The authors tackled the limitation of discrete diffusion models by introducing CaDDi, a non-Markovian discrete diffusion model, which outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks. CaDDi substantially narrows the gap to large autoregressive transformers.

Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal language models in expressive power. A key limitation lies in their reliance on the Markovian assumption, which restricts each step to condition only on the current state, leading to potential uncorrectable error accumulation. In this paper, we introduce CaDDi (Causal Discrete Diffusion Model), a discrete diffusion model that conditions on the entire generative trajectory, thereby lifting the Markov constraint and allowing the model to revisit and improve past states. By unifying sequential (causal) and temporal (diffusion) reasoning in a single non-Markovian transformer, CaDDi also treats standard causal language models as a special case and permits the direct reuse of pretrained LLM weights with no architectural changes. Empirically, CaDDi outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks, substantially narrowing the remaining gap to large autoregressive transformers.

View on arXiv PDF

Similar