iMatching: Imperative Correspondence Learning
This addresses a foundational challenge in computer vision for applications such as visual odometry and 3D reconstruction, representing a novel approach rather than an incremental improvement.
The paper tackles the problem of learning feature correspondence in computer vision without accurate per-pixel labels by introducing a self-supervised scheme called imperative learning, which uses reprojection error from bundle adjustment as a supervisory signal and achieves a 30% accuracy gain over state-of-the-art models in tasks like feature matching and pose estimation.
Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models.