SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
This work addresses the problem of inefficient model storage and overfitting in diffusion-based image personalization for AI practitioners, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the limitations of existing text-to-image diffusion models for personalization, such as overfitting and large parameter sizes, by proposing SVDiff, a method that fine-tunes singular values of weight matrices, resulting in a model with approximately 2,200 times fewer parameters than vanilla DreamBooth.
Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing text-to-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size compared to existing methods (approximately 2,200 times fewer parameters compared with vanilla DreamBooth), making it more practical for real-world applications.