VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content
This addresses authenticity concerns in e-commerce and digital entertainment by providing a tool for responsible use, though it is incremental as it builds on existing detection paradigms.
The paper tackles the problem of detecting AI-generated virtual try-on content by introducing VTONGuard, a large-scale benchmark dataset with over 775,000 images, and proposes a multi-task framework that achieves the best overall performance on this benchmark.
With the rapid advancement of generative AI, virtual try-on (VTON) systems are becoming increasingly common in e-commerce and digital entertainment. However, the growing realism of AI-generated try-on content raises pressing concerns about authenticity and responsible use. To address this, we present VTONGuard, a large-scale benchmark dataset containing over 775,000 real and synthetic try-on images. The dataset covers diverse real-world conditions, including variations in pose, background, and garment styles, and provides both authentic and manipulated examples. Based on this benchmark, we conduct a systematic evaluation of multiple detection paradigms under unified training and testing protocols. Our results reveal each method's strengths and weaknesses and highlight the persistent challenge of cross-paradigm generalization. To further advance detection, we design a multi-task framework that integrates auxiliary segmentation to enhance boundary-aware feature learning, achieving the best overall performance on VTONGuard. We expect this benchmark to enable fair comparisons, facilitate the development of more robust detection models, and promote the safe and responsible deployment of VTON technologies in practice.