R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of Dynamic Scenes
This addresses a safety issue in autonomous vehicles by improving depth estimation for traffic participants, though it is incremental as it builds on existing self-supervised frameworks with radar integration.
The paper tackles the problem of erroneous depth predictions for dynamic objects in self-supervised monocular depth estimation for driving scenarios by introducing R4Dyn, which uses cost-efficient radar data as weak supervision and an extra input, resulting in a 37% improvement on cars in the nuScenes dataset.
While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posing a potential safety issue. In this paper, we present R4Dyn, a novel set of techniques to use cost-efficient radar data on top of a self-supervised depth estimation framework. In particular, we show how radar can be used during training as weak supervision signal, as well as an extra input to enhance the estimation robustness at inference time. Since automotive radars are readily available, this allows to collect training data from a variety of existing vehicles. Moreover, by filtering and expanding the signal to make it compatible with learning-based approaches, we address radar inherent issues, such as noise and sparsity. With R4Dyn we are able to overcome a major limitation of self-supervised depth estimation, i.e. the prediction of traffic participants. We substantially improve the estimation on dynamic objects, such as cars by 37% on the challenging nuScenes dataset, hence demonstrating that radar is a valuable additional sensor for monocular depth estimation in autonomous vehicles.