CVAIApr 17, 2025

MGT: Extending Virtual Try-Off to Multi-Garment Scenarios

arXiv:2504.13078v26 citationsh-index: 72025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge in fashion industry computer vision, enabling more accurate virtual try-on and try-off applications.

The paper tackles the problem of extracting standardized garment images from photos of clothed individuals in multi-garment scenarios, achieving state-of-the-art results on VITON-HD and competitive performance on DressCode.

Computer vision is transforming fashion industry through Virtual Try-On (VTON) and Virtual Try-Off (VTOFF). VTON generates images of a person in a specified garment using a target photo and a standardized garment image, while a more challenging variant, Person-to-Person Virtual Try-On (p2p-VTON), uses a photo of another person wearing the garment. VTOFF, in contrast, extracts standardized garment images from photos of clothed individuals. We introduce Multi-Garment TryOffDiff (MGT), a diffusion-based VTOFF model capable of handling diverse garment types, including upper-body, lower-body, and dresses. MGT builds on a latent diffusion architecture with SigLIP-based image conditioning to capture garment characteristics such as shape, texture, and pattern. To address garment diversity, MGT incorporates class-specific embeddings, achieving state-of-the-art VTOFF results on VITON-HD and competitive performance on DressCode. When paired with VTON models, it further enhances p2p-VTON by reducing unwanted attribute transfer, such as skin tone, ensuring preservation of person-specific characteristics. Demo, code, and models are available at: https://rizavelioglu.github.io/tryoffdiff/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes