Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth
This work solves the dynamic scene challenge for monocular depth estimation in applications like autonomous driving, but it is incremental as it builds on existing methods by enhancing prediction and loss levels.
The paper tackles the problem of accuracy degradation in dynamic scenes for self-supervised monocular depth prediction by addressing object motion mismatch and occlusion, resulting in significant performance improvements on Cityscapes and KITTI datasets, especially in dynamic object areas.
Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions. Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level. In this paper, we accordingly propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels. Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme. A Dynamic Object Motion Disentanglement (DOMD) module is proposed to disentangle object motions to solve the mismatch problem. Moreover, novel occlusion-aware Cost Volume and Re-projection Loss are designed to alleviate the occlusion effects of object motions. Extensive analyses and experiments on the Cityscapes and KITTI datasets show that our method significantly outperforms the state-of-the-art monocular depth prediction methods, especially in the areas of dynamic objects. Code is available at https://github.com/AutoAILab/DynamicDepth