Relative Camera Pose Estimation Using Convolutional Neural Networks
This addresses camera pose estimation for computer vision applications, but it is incremental as it builds on existing CNN approaches with specific architectural tweaks.
The paper tackles the problem of estimating relative camera pose from two RGB images by using a convolutional neural network trained end-to-end, achieving clear improvements over baseline methods like SURF and ORB, with further gains from adding a spatial pyramid pooling layer.
This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras. The proposed network takes RGB images from both cameras as input and directly produces the relative rotation and translation as output. The system is trained in an end-to-end manner utilising transfer learning from a large scale classification dataset. The introduced approach is compared with widely used local feature based methods (SURF, ORB) and the results indicate a clear improvement over the baseline. In addition, a variant of the proposed architecture containing a spatial pyramid pooling (SPP) layer is evaluated and shown to further improve the performance.