LGAIOct 29, 2024

Discrete Modeling via Boundary Conditional Diffusion Processes

arXiv:2410.22380v1h-index: 28NIPS
Originality Highly original
AI Analysis

This work addresses a key bottleneck in applying diffusion models to discrete data, offering improved performance for tasks like language modeling and image generation.

The paper tackles the discrepancy between discrete data and continuous diffusion modeling by introducing a boundary conditional diffusion process, achieving state-of-the-art results in language translation and summarization tasks, and setting a new benchmark for categorical image generation on Cifar-10.

We present an novel framework for efficiently and effectively extending the powerful continuous diffusion processes to discrete modeling. Previous approaches have suffered from the discrepancy between discrete data and continuous modeling. Our study reveals that the absence of guidance from discrete boundaries in learning probability contours is one of the main reasons. To address this issue, we propose a two-step forward process that first estimates the boundary as a prior distribution and then rescales the forward trajectory to construct a boundary conditional diffusion model. The reverse process is proportionally adjusted to guarantee that the learned contours yield more precise discrete data. Experimental results indicate that our approach achieves strong performance in both language modeling and discrete image generation tasks. In language modeling, our approach surpasses previous state-of-the-art continuous diffusion language models in three translation tasks and a summarization task, while also demonstrating competitive performance compared to auto-regressive transformers. Moreover, our method achieves comparable results to continuous diffusion models when using discrete ordinal pixels and establishes a new state-of-the-art for categorical image generation on the Cifar-10 dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes