Patch2Pix: Epipolar-Guided Pixel-Level Correspondences
This work addresses the low matching resolution bottleneck in correspondence networks for visual localization by introducing a refinement step, benefiting applications requiring precise image matching.
The authors propose Patch2Pix, a novel refinement network that improves correspondence estimation by first predicting patch-level match proposals and then refining them to pixel-level matches. This approach significantly enhances the performance of correspondence networks across image matching, homography estimation, and localization tasks, and achieves state-of-the-art localization performance when combined with fully-supervised methods.
The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottleneck. In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them. We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores. Patch2Pix is weakly supervised to learn correspondences that are consistent with the epipolar geometry of an input image pair. We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks. In addition, we show that our learned refinement generalizes to fully-supervised methods without re-training, which leads us to state-of-the-art localization performance. The code is available at https://github.com/GrumpyZhou/patch2pix.