Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
This addresses the problem of discrete text generation for researchers and practitioners in natural language processing, representing an incremental improvement over prior diffusion approaches.
The paper tackles the challenge of adapting diffusion models to text generation by proposing Smoothie, a method that smooths token embeddings based on semantic similarity, which outperforms existing diffusion-based models in generation quality on sequence-to-sequence tasks.
Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex. Our code is available at https://github.com/ashaba1in/smoothie.