Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning
This addresses the challenge of training diffusion models effectively with limited data, which is a common issue in various downstream applications, though it appears incremental as it builds on existing fine-tuning and distillation techniques.
The paper tackles the problem of limited generation capacity and expressiveness in diffusion models trained on small datasets by proposing Self-Distillation for Fine-Tuning (SDFT), which transfers general features from large pretrained models to improve performance in tasks like domain translation and text-guided image manipulation, with experimental results showing enhanced expressiveness.
Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address these challenges by leveraging diverse features from diffusion models pretrained on large source datasets. SDFT distills more general features (shape, colors, etc.) and less domain-specific features (texture, fine details, etc) from the source model, allowing successful knowledge transfer without disturbing the training process on target datasets. The proposed method is not constrained by the specific architecture of the model and thus can be generally adopted to existing frameworks. Experimental results demonstrate that SDFT enhances the expressiveness of the diffusion model with limited datasets, resulting in improved generation capabilities across various downstream tasks.