CVSep 22, 2024

GroupDiff: Diffusion-based Group Portrait Editing

arXiv:2409.14379v14 citationsh-index: 24
Originality Incremental advance
AI Analysis

It addresses a practical problem for users wanting to edit group photos, but it is incremental as it builds on existing diffusion models with specific adaptations.

The paper tackles group portrait editing by proposing GroupDiff, a diffusion-based method that enables adding, deleting, or manipulating persons in photos while preserving appearance and offering control, achieving state-of-the-art performance in experiments.

Group portrait editing is highly desirable since users constantly want to add a person, delete a person, or manipulate existing persons. It is also challenging due to the intricate dynamics of human interactions and the diverse gestures. In this work, we present GroupDiff, a pioneering effort to tackle group photo editing with three dedicated contributions: 1) Data Engine: Since there is no labeled data for group photo editing, we create a data engine to generate paired data for training. The training data engine covers the diverse needs of group portrait editing. 2) Appearance Preservation: To keep the appearance consistent after editing, we inject the images of persons from the group photo into the attention modules and employ skeletons to provide intra-person guidance. 3) Control Flexibility: Bounding boxes indicating the locations of each person are used to reweight the attention matrix so that the features of each person can be injected into the correct places. This inter-person guidance provides flexible manners for manipulation. Extensive experiments demonstrate that GroupDiff exhibits state-of-the-art performance compared to existing methods. GroupDiff offers controllability for editing and maintains the fidelity of the original photos.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes