3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching
This addresses the problem of dense feature matching for computer vision applications, but it is incremental as it adapts student-teacher learning to a new task.
The paper tackles dense visual correspondence between images by proposing 3DG-STFM, a student-teacher learning method that uses a multi-modal teacher model with 3D supervision to guide a 2D unimodal student model, resulting in outperforming state-of-the-art methods on camera pose and homography estimation tasks.
We tackle the essential task of finding dense visual correspondences between a pair of images. This is a challenging problem due to various factors such as poor texture, repetitive patterns, illumination variation, and motion blur in practical scenarios. In contrast to methods that use dense correspondence ground-truths as direct supervision for local feature matching training, we train 3DG-STFM: a multi-modal matching model (Teacher) to enforce the depth consistency under 3D dense correspondence supervision and transfer the knowledge to 2D unimodal matching model (Student). Both teacher and student models consist of two transformer-based matching modules that obtain dense correspondences in a coarse-to-fine manner. The teacher model guides the student model to learn RGB-induced depth information for the matching purpose on both coarse and fine branches. We also evaluate 3DG-STFM on a model compression task. To the best of our knowledge, 3DG-STFM is the first student-teacher learning method for the local feature matching task. The experiments show that our method outperforms state-of-the-art methods on indoor and outdoor camera pose estimations, and homography estimation problems. Code is available at: https://github.com/Ryan-prime/3DG-STFM.