Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion
This addresses the challenge of continual learning for text-to-image diffusion models, enabling more efficient and adaptable AI systems for creative applications, though it is incremental as it builds on existing customization methods.
The paper tackles the problem of catastrophic forgetting in lifelong few-shot customization of text-to-image diffusion models, proposing a data-free knowledge distillation strategy and an In-Context Generation paradigm to retain old knowledge while learning new tasks with minimal data, achieving high-quality image generation in experiments.
Lifelong few-shot customization for text-to-image diffusion aims to continually generalize existing models for new tasks with minimal data while preserving old knowledge. Current customization diffusion models excel in few-shot tasks but struggle with catastrophic forgetting problems in lifelong generations. In this study, we identify and categorize the catastrophic forgetting problems into two folds: relevant concepts forgetting and previous concepts forgetting. To address these challenges, we first devise a data-free knowledge distillation strategy to tackle relevant concepts forgetting. Unlike existing methods that rely on additional real data or offline replay of original concept data, our approach enables on-the-fly knowledge distillation to retain the previous concepts while learning new ones, without accessing any previous data. Second, we develop an In-Context Generation (ICGen) paradigm that allows the diffusion model to be conditioned upon the input vision context, which facilitates the few-shot generation and mitigates the issue of previous concepts forgetting. Extensive experiments show that the proposed Lifelong Few-Shot Diffusion (LFS-Diffusion) method can produce high-quality and accurate images while maintaining previously learned knowledge.