CVMar 4, 2024

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

arXiv:2403.01779v2184 citationsh-index: 10Has CodeAAAI
AI Analysis

This work addresses the need for more realistic and controllable virtual try-on systems in e-commerce and fashion, representing a significant advancement rather than an incremental improvement.

The paper tackles the problem of generating realistic and controllable virtual try-on images by introducing OOTDiffusion, which uses a latent diffusion model with an outfitting UNet and fusion mechanism, achieving high-quality results that outperform other methods on VITON-HD and Dress Code datasets.

We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating an impressive breakthrough in virtual try-on. Our source code is available at https://github.com/levihsu/OOTDiffusion.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes