CVJun 15, 2024

Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On

Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen Qian, Jianfu Zhang, Liqing Zhang

arXiv:2406.10539v12.0

Originality Incremental advance

AI Analysis

This addresses the need for better visualization tools in e-commerce, though it appears incremental as it builds on existing methods like ViT and diffusion models.

The paper tackles the problem of enhancing realism and detail in virtual clothes try-on for online shopping by introducing a self-supervised Vision Transformer with a diffusion model, achieving substantial advancements over existing technologies.

Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. Techniques such as conditional guidance and focus on key regions have been integrated into our approach. These combined strategies empower the diffusion model to reproduce clothing details with increased clarity and realism. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences, significantly surpassing the capabilities of existing technologies.

View on arXiv PDF

Similar