CVAug 11, 2025

MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization

arXiv:2508.08488v11 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the need for more realistic and flexible virtual try-on in fashion retail and personalization, though it appears incremental by building on diffusion models.

The paper tackled the problem of virtual try-on by proposing MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments with person identity, resulting in outperforming existing methods on benchmarks like VITON-HD and DressCode with high-fidelity, identity-preserving results.

Virtual try-on seeks to generate photorealistic images of individuals in desired garments, a task that must simultaneously preserve personal identity and garment fidelity for practical use in fashion retail and personalization. However, existing methods typically handle upper and lower garments separately, rely on heavy preprocessing, and often fail to preserve person-specific cues such as tattoos, accessories, and body shape-resulting in limited realism and flexibility. To this end, we introduce MuGa-VTON, a unified multi-garment diffusion framework that jointly models upper and lower garments together with person identity in a shared latent space. Specifically, we proposed three key modules: the Garment Representation Module (GRM) for capturing both garment semantics, the Person Representation Module (PRM) for encoding identity and pose cues, and the A-DiT fusion module, which integrates garment, person, and text-prompt features through a diffusion transformer. This architecture supports prompt-based customization, allowing fine-grained garment modifications with minimal user input. Extensive experiments on the VITON-HD and DressCode benchmarks demonstrate that MuGa-VTON outperforms existing methods in both qualitative and quantitative evaluations, producing high-fidelity, identity-preserving results suitable for real-world virtual try-on applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes