2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud
This addresses the difficulty in visual pose estimation for robotics or autonomous systems by enabling direct matching between 2D images and 3D point clouds, though it appears incremental as it builds on existing deep learning methods for descriptor learning.
The paper tackles the problem of establishing 2D-3D correspondences between images and point clouds for visual pose estimation by proposing 2D3D-MatchNet, an end-to-end deep network that learns descriptors for both modalities, and experimental results verify its feasibility.
Large-scale point cloud generated from 3D sensors is more accurate than its image-based counterpart. However, it is seldom used in visual pose estimation due to the difficulty in obtaining 2D-3D image to point cloud correspondences. In this paper, we propose the 2D3D-MatchNet - an end-to-end deep network architecture to jointly learn the descriptors for 2D and 3D keypoint from image and point cloud, respectively. As a result, we are able to directly match and establish 2D-3D correspondences from the query image and 3D point cloud reference map for visual pose estimation. We create our Oxford 2D-3D Patches dataset from the Oxford Robotcar dataset with the ground truth camera poses and 2D-3D image to point cloud correspondences for training and testing the deep network. Experimental results verify the feasibility of our approach.