CVGRLGDec 8, 2022

Multi-Concept Customization of Text-to-Image Diffusion

arXiv:2212.04488v21310 citationsh-index: 73
AI Analysis

This addresses the need for users to personalize AI-generated images with their own concepts, representing an incremental improvement in efficiency and multi-concept handling.

The paper tackles the problem of efficiently customizing text-to-image diffusion models to learn new concepts from a few examples and compose multiple concepts together, achieving results comparable to baselines with fast tuning times of about 6 minutes.

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes