Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes
This addresses a key limitation in depth estimation for autonomous driving and robotics by handling dynamic scenes, though it is an incremental advance over existing unsupervised methods.
The paper tackles the problem of inaccurate depth estimation for moving objects in unsupervised monocular depth estimation by introducing Dynamo-Depth, which jointly learns depth, 3D flow, and motion segmentation from videos, achieving state-of-the-art performance on Waymo Open and nuScenes datasets with significant improvements in moving object depth.
Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io.