Self-Guided Diffusion Models
This work addresses the problem of annotation dependency in diffusion models for image generation, offering a flexible and scalable solution that is particularly beneficial for unbalanced datasets, though it is incremental as it builds on existing guidance techniques.
The paper tackles the dependency of diffusion models on large amounts of image-annotation pairs for guidance by introducing a self-guided framework that uses self-supervision signals, eliminating the need for annotations. The method outperforms unguided diffusion models and can surpass ground-truth label guidance on unbalanced data, with experiments showing improved performance on single-label and multi-label datasets.
Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance and may even surpass guidance based on ground-truth labels, especially on unbalanced data. When equipped with self-supervised box or mask proposals, our method further generates visually diverse yet semantically consistent images, without the need for any class, box, or segment label annotation. Self-guided diffusion is simple, flexible and expected to profit from deployment at scale. Source code will be at: https://taohu.me/sgdm/