CVJul 18, 2017

Robust Monocular SLAM for Egocentric Videos

arXiv:1707.05564v2
Originality Incremental advance
AI Analysis

This addresses a specific problem for robotics and AR/VR applications where egocentric video SLAM is unreliable, though it appears incremental as it adapts existing SFM techniques to SLAM.

The paper tackles the failure of state-of-the-art SLAM techniques on egocentric videos by identifying causes like dominant 3D rotations and low parallax, and proposes a method that solves SLAM as a Structure from Motion problem over sliding windows with 2D rotation and translation averaging. The technique successfully handles long, shaky egocentric videos where other methods fail, as validated on public datasets.

Regardless of the tremendous progress, a truly general purpose pipeline for Simultaneous Localization and Mapping (SLAM) remains a challenge. We investigate the reported failure of state of the art (SOTA) SLAM techniques on egocentric videos. We find that the dominant 3D rotations, low parallax between successive frames, and primarily forward motion in egocentric videos are the most common causes of failures. The incremental nature of SOTA SLAM, in the presence of unreliable pose and 3D estimates in egocentric videos, with no opportunities for global loop closures, generates drifts and leads to the eventual failures of such techniques. Taking inspiration from batch mode Structure from Motion (SFM) techniques, we propose to solve SLAM as an SFM problem over the sliding temporal windows. This makes the problem well constrained. Further, we propose to initialize the camera poses using 2D rotation averaging, followed by translation averaging before structure estimation using bundle adjustment. This helps in stabilizing the camera poses when 3D estimates are not reliable. We show that the proposed SLAM technique, incorporating the two key ideas works successfully for long, shaky egocentric videos where other SOTA techniques have been reported to fail. Qualitative and quantitative comparisons on publicly available egocentric video datasets validate our results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes