CVJul 16, 2018

ENG: End-to-end Neural Geometry for Robust Depth and Pose Estimation using CNNs

arXiv:1807.05705v217 citations
Originality Incremental advance
AI Analysis

This work addresses robust depth and pose estimation for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the problem of 3D scene reconstruction and camera pose estimation from images by introducing an end-to-end neural framework that achieves state-of-the-art performance in single image depth prediction for indoor and outdoor scenes, and outperforms previous motion prediction systems.

Recovering structure and motion parameters given a image pair or a sequence of images is a well studied problem in computer vision. This is often achieved by employing Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM) algorithms based on the real-time requirements. Recently, with the advent of Convolutional Neural Networks (CNNs) researchers have explored the possibility of using machine learning techniques to reconstruct the 3D structure of a scene and jointly predict the camera pose. In this work, we present a framework that achieves state-of-the-art performance on single image depth prediction for both indoor and outdoor scenes. The depth prediction system is then extended to predict optical flow and ultimately the camera pose and trained end-to-end. Our motion estimation framework outperforms the previous motion prediction systems and we also demonstrate that the state-of-the-art metric depths can be further improved using the knowledge of pose.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes