CLAIJan 19

Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation

arXiv:2601.13228v13 citations
Originality Highly original
AI Analysis

This work addresses the need for flexible and high-quality language generation for tasks such as infilling and rewriting, offering a novel paradigm that combines the strengths of autoregressive and diffusion models.

The authors tackled the problem of diffusion language models having lower sample quality and stability than autoregressive models by proposing A3, a framework that extends autoregressive modeling for any-order generation, and demonstrated that A3 outperforms diffusion-based models in tasks like question answering and story infilling.

Diffusion language models enable any-order generation and bidirectional conditioning, offering appealing flexibility for tasks such as infilling, rewriting, and self-correction. However, their formulation-predicting one part of a sequence from another within a single-step dependency-limits modeling depth and often yields lower sample quality and stability than autoregressive (AR) models. To address this, we revisit autoregressive modeling as a foundation and reformulate diffusion-style training into a structured multi-group prediction process. We propose Any-order Any-subset Autoregressive modeling (A3), a generalized framework that extends the standard AR factorization to arbitrary token groups and generation orders. A3 preserves the probabilistic rigor and multi-layer dependency modeling of AR while inheriting diffusion models' flexibility for parallel and bidirectional generation. We implement A3 through a two-stream attention architecture and a progressive adaptation strategy that transitions pretrained AR models toward any-order prediction. Experiments on question answering, commonsense reasoning, and story infilling demonstrate that A3 outperforms diffusion-based models while maintaining flexible decoding. This work offers a unified approach for a flexible, efficient, and novel language modeling paradigm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes