CV AISep 8, 2023

Create Your World: Lifelong Text-to-Image Diffusion

Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong

arXiv:2309.04430v122.259 citationsh-index: 49

Originality Incremental advance

AI Analysis

This addresses the challenge of catastrophic forgetting and neglecting in diffusion models for users needing personalized, evolving image generation, though it appears incremental as it builds on existing text-to-image frameworks.

The paper tackles the problem of lifelong text-to-image generation, where models must learn new user concepts continuously without forgetting past ones, and achieves improved performance in generating faithful images across continual prompts compared to state-of-the-art models, as shown by qualitative and quantitative metrics.

Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a use's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/.

View on arXiv PDF

Similar