CVIVOct 4, 2019

Two Stream Networks for Self-Supervised Ego-Motion Estimation

arXiv:1910.01764v317 citations
Originality Incremental advance
AI Analysis

This work addresses accurate visual odometry for autonomous driving systems, representing an incremental improvement with strong specific gains.

The paper tackled self-supervised ego-motion estimation from unlabeled RGB video by proposing a two-stream network using RGB and inferred depth, achieving state-of-the-art results on the KITTI odometry benchmark and showing performance scaling with up to 1 million frames.

Learning depth and camera ego-motion from raw unlabeled RGB video streams is seeing exciting progress through self-supervision from strong geometric cues. To leverage not only appearance but also scene geometry, we propose a novel self-supervised two-stream network using RGB and inferred depth information for accurate visual odometry. In addition, we introduce a sparsity-inducing data augmentation policy for ego-motion learning that effectively regularizes the pose network to enable stronger generalization performance. As a result, we show that our proposed two-stream pose network achieves state-of-the-art results among learning-based methods on the KITTI odometry benchmark, and is especially suited for self-supervision at scale. Our experiments on a large-scale urban driving dataset of 1 million frames indicate that the performance of our proposed architecture does indeed scale progressively with more data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes