CVJul 21, 2024

D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du

arXiv:2407.15111v112.115 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses virtual try-on for fashion and e-commerce applications, representing an incremental improvement with novel components for handling specific bottlenecks.

The paper tackles challenges in image-based virtual try-on, such as semantic inconsistencies and reliance on static parsers, by introducing D$^4$-VTON, which uses dynamic semantics disentangling and a differential information tracking path to achieve realistic results, outperforming existing methods in quantitative and qualitative evaluations.

In this paper, we introduce D$^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoising. Our approach utilizes two key technologies: Firstly, Dynamic Semantics Disentangling Modules (DSDMs) extract abstract semantic information from garments to create distinct local flows, improving precise garment warping in a self-discovered manner. Secondly, by integrating a Differential Information Tracking Path (DITP), we establish a novel diffusion-based VTON paradigm. This path captures differential information between incomplete try-on inputs and their complete versions, enabling the network to handle multiple degradations independently, thereby minimizing learning ambiguities and achieving realistic results with minimal overhead. Extensive experiments demonstrate that D$^4$-VTON significantly outperforms existing methods in both quantitative metrics and qualitative evaluations, demonstrating its capability in generating realistic images and ensuring semantic consistency.

View on arXiv PDF Code

Similar