CVLGOct 10, 2023

ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning

Georgia Tech
arXiv:2310.06968v11 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the need for scalable and real-time consistent object generation in applications like comic book illustrations, offering an incremental improvement by building on existing models without weight modifications.

The paper tackles the problem of text-to-image models failing to consistently generate the same objects across different contexts, introducing ObjectComposer, a training-free method that enables consistent generation of multiple objects from reference images without fine-tuning.

Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve this problem by extending the vocabulary of diffusion models through fine-tuning. However, even lightweight fine-tuning approaches can be prohibitively expensive to run at scale and in real-time. We introduce a method called ObjectComposer for generating compositions of multiple objects that resemble user-specified images. Our approach is training-free, leveraging the abilities of preexisting models. We build upon the recent BLIP-Diffusion model, which can generate images of single objects specified by reference images. ObjectComposer enables the consistent generation of compositions containing multiple specific objects simultaneously, all without modifying the weights of the underlying models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes