Synthesizing Artifact Dataset for Pixel-level Detection
This work provides a scalable solution for creating pixel-level artifact annotation datasets, benefiting researchers and developers in computer vision and generative AI by reducing reliance on manual labeling.
The paper tackled the problem of training artifact detectors for image-generative models by addressing the lack of expensive pixel-level human annotations, proposing an artifact corruption pipeline that automatically injects artifacts into synthetic images to generate annotations, resulting in performance improvements of 13.2% for ConvNeXt and 3.7% for Swin-T compared to baselines.
Artifact detectors have been shown to enhance the performance of image-generative models by serving as reward models during fine-tuning. These detectors enable the generative model to improve overall output fidelity and aesthetics. However, training the artifact detector requires expensive pixel-level human annotations that specify the artifact regions. The lack of annotated data limits the performance of the artifact detector. A naive pseudo-labeling approach-training a weak detector and using it to annotate unlabeled images-suffers from noisy labels, resulting in poor performance. To address this, we propose an artifact corruption pipeline that automatically injects artifacts into clean, high-quality synthetic images on a predetermined region, thereby producing pixel-level annotations without manual labeling. The proposed method enables training of an artifact detector that achieves performance improvements of 13.2% for ConvNeXt and 3.7% for Swin-T, as verified on human-labeled data, compared to baseline approaches. This work represents an initial step toward scalable pixel-level artifact annotation datasets that integrate world knowledge into artifact detection.