CVFeb 3, 2025

MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer

Le Shen, Yanting Kang, Rong Huang, Zhijie Wang

arXiv:2502.01626v110.22 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the ease-of-use issue in virtual try-on for users by eliminating the need for garment masks, though it is incremental as it builds on existing diffusion transformer methods.

The paper tackled the problem of person-to-person virtual try-on without requiring garment masks, proposing MFP-VTON by adapting a garment-to-person model and dataset, and it achieved high-fidelity image generation in both person-to-person and garment-to-person tasks.

The garment-to-person virtual try-on (VTON) task, which aims to generate fitting images of a person wearing a reference garment, has made significant strides. However, obtaining a standard garment is often more challenging than using the garment already worn by the person. To improve ease of use, we propose MFP-VTON, a Mask-Free framework for Person-to-Person VTON. Recognizing the scarcity of person-to-person data, we adapt a garment-to-person model and dataset to construct a specialized dataset for this task. Our approach builds upon a pretrained diffusion transformer, leveraging its strong generative capabilities. During mask-free model fine-tuning, we introduce a Focus Attention loss to emphasize the garment of the reference person and the details outside the garment of the target person. Experimental results demonstrate that our model excels in both person-to-person and garment-to-person VTON tasks, generating high-fidelity fitting images.

View on arXiv PDF

Similar