CVGRAug 13, 2024

Generative Photomontage

arXiv:2408.07116v33 citationsh-index: 11
Originality Highly original
AI Analysis

This addresses the challenge for users of text-to-image models who struggle to obtain single images that capture all desired elements, offering an incremental improvement through a novel compositing approach.

The paper tackles the problem of text-to-image models producing inconsistent results by proposing a framework that allows users to composite desired images from parts of multiple generated images, achieving faithful preservation of user-selected regions with harmonious blending.

Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them harmoniously. We demonstrate that our flexible framework can be used for many applications, including generating new appearance combinations, fixing incorrect shapes and artifacts, and improving prompt alignment. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods and various baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes