CVLGRODec 19, 2019

Instance-wise Depth and Motion Learning from Monocular Videos

arXiv:1912.09351v29 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D scene understanding in autonomous driving by improving depth and motion estimation, though it is incremental with novel components.

The paper tackles the problem of jointly estimating depth, ego-motion, and motion of multiple dynamic objects from monocular videos without supervision, achieving state-of-the-art performance on the KITTI dataset.

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we propose a differentiable forward rigid projection module that plays a key role in our instance-wise depth and motion learning. Second, we design an instance-wise photometric and geometric consistency loss that effectively decomposes background and moving object regions. Lastly, we introduce a new auto-annotation scheme to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code and dataset will be available at https://github.com/SeokjuLee/Insta-DM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes