SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context
This work addresses challenges in computer vision for applications like 3D reconstruction and robotics, offering an incremental improvement over existing CNN-based methods by enhancing context aggregation and motion field refinement.
The paper tackles the problem of two-view correspondence learning by proposing SC-Net, which integrates spatial and cross-channel context to improve accuracy in motion field generation, resulting in state-of-the-art performance on YFCC100M and SUN3D datasets for tasks like relative pose estimation and outlier removal.
Recent research has focused on using convolutional neural networks (CNNs) as the backbones in two-view correspondence learning, demonstrating significant superiority over methods based on multilayer perceptrons. However, CNN backbones that are not tailored to specific tasks may fail to effectively aggregate global context and oversmooth dense motion fields in scenes with large disparity. To address these problems, we propose a novel network named SC-Net, which effectively integrates bilateral context from both spatial and channel perspectives. Specifically, we design an adaptive focused regularization module (AFR) to enhance the model's position-awareness and robustness against spurious motion samples, thereby facilitating the generation of a more accurate motion field. We then propose a bilateral field adjustment module (BFA) to refine the motion field by simultaneously modeling long-range relationships and facilitating interaction across spatial and channel dimensions. Finally, we recover the motion vectors from the refined field using a position-aware recovery module (PAR) that ensures consistency and precision. Extensive experiments demonstrate that SC-Net outperforms state-of-the-art methods in relative pose estimation and outlier removal tasks on YFCC100M and SUN3D datasets. Source code is available at http://www.linshuyuan.com.