MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation
This addresses the challenge of accurate 3D localization for autonomous driving and robotics, offering a lightweight solution with uncertainty estimation, though it is incremental in improving existing monocular methods.
The paper tackles the ill-posed problem of 3D pedestrian localization from monocular RGB images by predicting confidence intervals using a Laplace-based loss, resulting in state-of-the-art performance on KITTI and nuScenes datasets and outperforming a stereo-based method for far-away pedestrians.
We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task by predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network that predicts 3D locations and corresponding confidence intervals given 2D human poses. The design is particularly well suited for small training data, cross-dataset generalization, and real-time applications. Our experiments show that we (i) outperform state-of-the-art results on KITTI and nuScenes datasets, (ii) even outperform a stereo-based method for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in cases of limited observations and out-of-distribution samples.