Self-Supervised Correspondence Estimation via Multiview Registration
This work addresses the problem of correspondence estimation for computer vision researchers, offering an incremental improvement by extending self-supervised learning to longer-range frames.
The paper tackled the problem of learning correspondence estimation from video by addressing the limitation of relying only on close-by frame pairs, proposing a self-supervised approach that uses multiview consistency in short RGB-D sequences to improve diversity and difficulty, resulting in performance on-par with supervised methods on indoor scenes.
Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences. Our approach combines pairwise correspondence estimation and registration with a novel SE(3) transformation synchronization algorithm. Our key insight is that self-supervised multiview registration allows us to obtain correspondences over longer time frames; increasing both the diversity and difficulty of sampled pairs. We evaluate our approach on indoor scenes for correspondence estimation and RGB-D pointcloud registration and find that we perform on-par with supervised approaches.