CVAug 25, 2022

A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation

Yu Chen, Xu Cao, Xiaoyi Lin, Baoru Huang, Xiao-Yun Zhou, Jian-Qing Zheng, Guang-Zhong Yang

arXiv:2208.11993v11.41 citationsh-index: 22

Originality Incremental advance

AI Analysis

This work addresses motion and depth recovery for autonomous driving and robot vision, presenting an incremental improvement over existing multi-task approaches.

The paper tackles the problem of joint monocular depth and optical flow estimation for robot vision by introducing a multi-task scheme with Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA) mechanisms, achieving improved performance on KITTI datasets compared to other multi-task methods.

Accurate motion and depth recovery is important for many robot vision tasks including autonomous driving. Most previous studies have achieved cooperative multi-task interaction via either pre-defined loss functions or cross-domain prediction. This paper presents a multi-task scheme that achieves mutual assistance by means of our Flow to Depth (F2D), Depth to Flow (D2F), and Exponential Moving Average (EMA). F2D and D2F mechanisms enable multi-scale information integration between optical flow and depth domain based on differentiable shallow nets. A dual-head mechanism is used to predict optical flow for rigid and non-rigid motion based on a divide-and-conquer manner, which significantly improves the optical flow estimation performance. Furthermore, to make the prediction more robust and stable, EMA is used for our multi-task training. Experimental results on KITTI datasets show that our multi-task scheme outperforms other multi-task schemes and provide marked improvements on the prediction results.

View on arXiv PDF

Similar