Text-to-image Diffusion Models in Generative AI: A Survey
It provides a comprehensive overview for researchers and practitioners in generative AI, but is incremental as a survey.
This survey reviews the progress of diffusion models for generating images from text, summarizing pioneering methods, improvements, and applications like video generation and image editing.
This survey reviews the progress of diffusion models in generating images from text, ~\textit{i.e.} text-to-image diffusion models. As a self-contained work, this survey starts with a brief introduction of how diffusion models work for image synthesis, followed by the background for text-conditioned image synthesis. Based on that, we present an organized review of pioneering methods and their improvements on text-to-image generation. We further summarize applications beyond image generation, such as text-guided generation for various modalities like videos, and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions.