CVAIMay 19, 2022

Unsupervised Learning of Depth, Camera Pose and Optical Flow from Monocular Video

arXiv:2205.09821v22 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the need for efficient multi-task perception in autonomous driving, though it is incremental as it builds on existing unsupervised geometry learning approaches.

The authors tackled the problem of jointly estimating depth, optical flow, and camera pose from monocular video without supervision, achieving results comparable to state-of-the-art models while reducing parameters to 8.4M (less than 5% of existing models).

We propose DFPNet -- an unsupervised, joint learning system for monocular Depth, Optical Flow and egomotion (Camera Pose) estimation from monocular image sequences. Due to the nature of 3D scene geometry these three components are coupled. We leverage this fact to jointly train all the three components in an end-to-end manner. A single composite loss function -- which involves image reconstruction-based loss for depth & optical flow, bidirectional consistency checks and smoothness loss components -- is used to train the network. Using hyperparameter tuning, we are able to reduce the model size to less than 5% (8.4M parameters) of state-of-the-art DFP models. Evaluation on KITTI and Cityscapes driving datasets reveals that our model achieves results comparable to state-of-the-art in all of the three tasks, even with the significantly smaller model size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes