CVROIVAug 4, 2019

Unsupervised Learning of Depth and Deep Representation for Visual Odometry from Monocular Videos in a Metric Space

arXiv:1908.01367v11 citations
AI Analysis

This addresses ego-motion estimation for autonomous systems, offering an incremental improvement by integrating hierarchical features and direct camera motion calculation.

The paper tackles unsupervised learning of depth and hierarchical feature representation for visual odometry from monocular videos, proposing a direct feature odometry framework that achieves competitive results on the KITTI dataset.

For ego-motion estimation, the feature representation of the scenes is crucial. Previous methods indicate that both the low-level and semantic feature-based methods can achieve promising results. Therefore, the incorporation of hierarchical feature representation may benefit from both methods. From this perspective, we propose a novel direct feature odometry framework, named DFO, for depth estimation and hierarchical feature representation learning from monocular videos. By exploiting the metric distance, our framework is able to learn the hierarchical feature representation without supervision. The pose is obtained with a coarse-to-fine approach from high-level to low-level features in enlarged feature maps. The pixel-level attention mask can be self-learned to provide the prior information. In contrast to the previous methods, our proposed method calculates the camera motion with a direct method rather than regressing the ego-motion from the pose network. With this approach, the consistency of the scale factor of translation can be constrained. Additionally, the proposed method is thus compatible with the traditional SLAM pipeline. Experiments on the KITTI dataset demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes