LG AIOct 28, 2025

Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models

Byeonghu Na, Minsang Park, Gyuwon Sim, Donghyeok Shin, HeeSun Bae, Mina Kang, Se Jung Kwon, Wanmo Kang, Il-Chul Moon

arXiv:2510.23974v15 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in text-to-image generation for users needing better control and alignment, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of fixed text embeddings in text-to-image diffusion models, which limit adaptability during generation, by proposing DATE, a method that dynamically updates embeddings at each diffusion timestep to improve alignment, achieving superior text-image alignment without additional training.

Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the generative process. We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. We formulate an optimization problem and derive an update rule that refines the text embeddings at each sampling step to improve alignment and preference between the mean predicted image and the text. This allows DATE to dynamically adapts the text conditions to the reverse-diffused images throughout diffusion sampling without requiring additional model training. Through theoretical analysis and empirical results, we show that DATE maintains the generative capability of the model while providing superior text-image alignment over fixed text embeddings across various tasks, including multi-concept generation and text-guided image editing. Our code is available at https://github.com/aailab-kaist/DATE.

View on arXiv PDF Code

Similar