CVSep 22, 2018

RPNet: an End-to-End Network for Relative Camera Pose Estimation

arXiv:1809.08402v172 citations
Originality Highly original
AI Analysis

This addresses the problem of robust camera pose estimation for computer vision applications, offering a novel approach that eliminates scale ambiguity in translation recovery, though it is an incremental advancement in deep learning-based pose estimation.

The paper tackles relative camera pose estimation from image pairs by proposing RPNet, an end-to-end deep neural network that directly infers relative poses without camera intrinsics or extrinsics, and it recovers the full translation vector, showing promising results on the Cambridge Landmark dataset with more accuracy and stability than traditional methods, especially for challenging images.

This paper addresses the task of relative camera pose estimation from raw image pixels, by means of deep neural networks. The proposed RPNet network takes pairs of images as input and directly infers the relative poses, without the need of camera intrinsic/extrinsic. While state-of-the-art systems based on SIFT + RANSAC, are able to recover the translation vector only up to scale, RPNet is trained to produce the full translation vector, in an end-to-end way. Experimental results on the Cambridge Landmark dataset show very promising results regarding the recovery of the full translation vector. They also show that RPNet produces more accurate and more stable results than traditional approaches, especially for hard images (repetitive textures, textureless images, etc). To the best of our knowledge, RPNet is the first attempt to recover full translation vectors in relative pose estimation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes