CVMar 30, 2023

PMatch: Paired Masked Image Modeling for Dense Geometric Matching

arXiv:2303.17342v117.846 citationsh-index: 70Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of pretraining cross-frame modules for dense geometric matching, which is important for computer vision applications like 3D reconstruction, but it is incremental as it builds on existing masked image modeling and transformer-based methods.

The paper tackles the problem of dense geometric matching between images of the same 3D structure by introducing a paired masked image modeling pretraining method and a cross-frame global matching module, achieving state-of-the-art performance on geometric matching benchmarks.

Dense geometric matching determines the dense pixel-wise correspondence between a source and support image corresponding to the same 3D structure. Prior works employ an encoder of transformer blocks to correlate the two-frame features. However, existing monocular pretraining tasks, e.g., image classification, and masked image modeling (MIM), can not pretrain the cross-frame module, yielding less optimal performance. To resolve this, we reformulate the MIM from reconstructing a single masked image to reconstructing a pair of masked images, enabling the pretraining of transformer module. Additionally, we incorporate a decoder into pretraining for improved upsampling results. Further, to be robust to the textureless area, we propose a novel cross-frame global matching module (CFGM). Since the most textureless area is planar surfaces, we propose a homography loss to further regularize its learning. Combined together, we achieve the State-of-The-Art (SoTA) performance on geometric matching. Codes and models are available at https://github.com/ShngJZ/PMatch.

View on arXiv PDF Code

Similar