Step-unrolled Denoising Autoencoders for Text Generation
This addresses the limitation of autoregressive models in text generation by enabling flexible, non-left-to-right generation, though it is incremental as it builds on denoising diffusion techniques.
The paper tackles the problem of text generation by proposing SUNDAE, a non-autoregressive generative model that starts from random inputs and iteratively improves them, achieving state-of-the-art results among non-autoregressive methods on WMT'14 English-to-German translation and producing good samples on language modeling datasets.
In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iterations than diffusion methods, while qualitatively producing better samples on natural language datasets. SUNDAE achieves state-of-the-art results (among non-autoregressive methods) on the WMT'14 English-to-German translation task and good qualitative results on unconditional language modeling on the Colossal Cleaned Common Crawl dataset and a dataset of Python code from GitHub. The non-autoregressive nature of SUNDAE opens up possibilities beyond left-to-right prompted generation, by filling in arbitrary blank patterns in a template.