CVDec 5, 2023

Customization Assistant for Text-to-image Generation

Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun

arXiv:2312.03045v212.116 citationsh-index: 13CVPR

Originality Incremental advance

AI Analysis

This addresses the need for efficient and user-friendly customization in text-to-image generation, though it is incremental as it builds on existing models.

The paper tackles the problem of customizing text-to-image generation without fine-tuning on test images and limited user interactions, achieving generation in 2-5 seconds with competitive results across domains.

Customizing pre-trained text-to-image generation model has attracted massive research interest recently, due to its huge potential in real-world applications. Although existing methods are able to generate creative content for a novel concept contained in single user-input image, their capability are still far from perfection. Specifically, most existing methods require fine-tuning the generative model on testing images. Some existing methods do not require fine-tuning, while their performance are unsatisfactory. Furthermore, the interaction between users and models are still limited to directive and descriptive prompts such as instructions and captions. In this work, we build a customization assistant based on pre-trained large language model and diffusion model, which can not only perform customized generation in a tuning-free manner, but also enable more user-friendly interactions: users can chat with the assistant and input either ambiguous text or clear instruction. Specifically, we propose a new framework consists of a new model design and a novel training strategy. The resulting assistant can perform customized generation in 2-5 seconds without any test time fine-tuning. Extensive experiments are conducted, competitive results have been obtained across different domains, illustrating the effectiveness of the proposed method.

View on arXiv PDF

Similar