LGMay 13

Discrete Stochastic Localization for Non-autoregressive Generation

Yunshu Wu, Jiayi Cheng, Longxuan Yu, Partha Thakuria, Rob Brekelmans, Evangelos E. Papalexakis, Greg Ver Steeg

arXiv:2605.1283675.01 citations

AI Analysis

For researchers in discrete sequence generation, DSL bridges the gap between continuous diffusion and masked discrete models, offering a more flexible and efficient non-autoregressive generation framework.

DSL introduces a continuous-state framework with unit-sphere token embeddings that makes denoising invariant to SNR, enabling a single network to support multiple sampling paths. Fine-tuning a pretrained MDLM with DSL improves MAUVE on OpenWebText across all step budgets (T=128 to T=1024) and enables hybrid sampling with as few as 48 steps without distillation.

Continuous diffusion is a natural framework for non-autoregressive generation but has generally lagged behind masked discrete diffusion models (MDMs) on discrete sequence generation. We argue that the bottleneck is not continuity itself, but a representation in which denoising depends on timestep-indexed noise regimes. We introduce \emph{Discrete Stochastic Localization} (DSL), a continuous-state framework with unit-sphere token embeddings whose Bayes-optimal denoiser is invariant to the nominal signal-to-noise ratio (SNR) under the localization channel. One trained network then supports an entire family of per-token SNR paths, with endpoint masked-diffusion paths as a special case. Fine-tuning a pretrained MDLM checkpoint with DSL substantially improves distributional faithfulness (MAUVE) on OpenWebText across all step budgets from $T{=}128$ to $T{=}1024$, and the same checkpoint supports random-order autoregressive sampling, as well as a hybrid continuous-then-discrete sampler using as few as T=48 total steps -- without distillation or retraining.

View on arXiv PDF

Similar