CVNov 25, 2024

Controllable Human Image Generation with Personalized Multi-Garments

arXiv:2411.16801v33 citationsh-index: 8CVPR
Originality Incremental advance
AI Analysis

This addresses the problem of data acquisition for fashion-related AI applications, offering a scalable solution for virtual try-on and other controllable generation tasks, though it is incremental in its approach.

The paper tackles the challenge of generating controllable human images with multiple reference garments by proposing BootComp, a framework that uses a synthetic dataset to train a diffusion model, achieving high-quality results with fine-grained detail preservation.

We present BootComp, a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments. Here, the main bottleneck is data acquisition for training: collecting a large-scale dataset of high-quality reference garment images per human subject is quite challenging, i.e., ideally, one needs to manually gather every single garment photograph worn by each human. To address this, we propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs, by introducing a model to extract any reference garment images from each human image. To ensure data quality, we also propose a filtering strategy to remove undesirable generated data based on measuring perceptual similarities between the garment presented in human image and extracted garment. Finally, by utilizing the constructed synthetic dataset, we train a diffusion model having two parallel denoising paths that use multiple garment images as conditions to generate human images while preserving their fine-grained details. We further show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain, including virtual try-on, and controllable human image generation with other conditions, e.g., pose, face, etc.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes