CVJun 25, 2023

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

arXiv:2306.14153v40.2120 citationsh-index: 14
AI Analysis85

This work addresses the challenge of adapting large-scale generative models to new domains with minimal data, which is crucial for applications requiring specialized image generation without extensive datasets.

The paper tackles the problem of fine-tuning diffusion models for domain-driven image generation with limited data, achieving better quality and greater diversity than state-of-the-art GAN-based approaches in unconditional few-shot generation and relieving overfitting for conditional generation.

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes