Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images
This addresses a data-hungry bottleneck in semantic correspondence for computer vision, offering an incremental improvement over existing methods.
The paper tackles the problem of scarce training keypoint pairs in semantic correspondence learning by using a simple machine annotator to densify training pairs without extra labels, achieving state-of-the-art results on benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW with improved robustness.
Semantic correspondence methods have advanced to obtaining high-quality correspondences employing complicated networks, aiming to maximize the model capacity. However, despite the performance improvements, they may remain constrained by the scarcity of training keypoint pairs, a consequence of the limited training images and the sparsity of keypoints. This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences and uncovers the models can be more trained by employing densified training pairs. We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision, requiring neither extra labeled key points nor trainable modules from unlabeled images. Consequently, our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW and enjoy further robustness on corruption benchmarks. Our code is available at https://github.com/naver-ai/matchme.