LGApr 8, 2025

Unifying Autoregressive and Diffusion-Based Sequence Generation

arXiv:2504.06416v215 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning by bridging two major paradigms for sequence generation, with potential broad impact across AI applications.

The paper tackles the problem of unifying autoregressive and diffusion-based sequence generation by introducing hyperschedules and hybrid token-wise noising processes, achieving state-of-the-art perplexity and generating diverse, high-quality sequences on standard benchmarks.

We present significant extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions, generalizing both autoregressive models (e.g., GPT) and conventional diffusion models (e.g., SEDD, MDLM) as special cases. Second, we propose two hybrid token-wise noising processes that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes, and we introduce a novel inference algorithm that leverages this new feature in a simplified context inspired from MDLM. To support efficient training and inference, we design attention masks compatible with KV-caching. Our methods achieve state-of-the-art perplexity and generate diverse, high-quality sequences across standard benchmarks, suggesting a promising path for autoregressive diffusion-based sequence generation. See code and resources at https://hdlm-colm.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes