CVMar 7, 2024

Controllable Generation with Text-to-Image Diffusion Models: A Survey

arXiv:2403.04279v194 citationsh-index: 15Has CodeIEEE Trans Pattern Anal Mach Intell
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers and practitioners in AI and computer vision, but is incremental as it synthesizes existing work rather than introducing new methods.

This survey reviews methods for enhancing text-to-image diffusion models to support additional conditions beyond text, addressing limitations in meeting varied application needs, and categorizes research into specific, multiple, and universal controllable generation approaches.

In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process for conditional generation. Additionally, we offer a detailed overview of research in this area, organizing it into distinct categories from the condition perspective: generation with specific conditions, generation with multiple conditions, and universal controllable generation. For an exhaustive list of the controllable generation literature surveyed, please refer to our curated repository at \url{https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes