Self-conditioned Embedding Diffusion for Text Generation
This work addresses the problem of efficient and high-quality text generation for natural language processing applications, representing an incremental step by adapting diffusion techniques from images to text.
The paper tackled the challenge of applying continuous diffusion models to text generation by proposing Self-conditioned Embedding Diffusion, which operates on token embeddings to enable flexible and scalable models for conditional and unconditional text generation, showing that the generated samples are comparable to those from standard autoregressive language models.
Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at inference time. Our work paves the way for scaling up diffusion models for text, similarly to autoregressive models, and for improving performance with recent refinements to continuous diffusion.