DistanceNet: Estimating Traveled Distance from Monocular Images using a Recurrent Convolutional Neural Network
This work addresses a specific bottleneck in robotics and autonomous navigation by providing a more accurate scale estimation for monocular systems, though it is incremental as it builds on hybrid deep learning approaches.
The paper tackled the scale ambiguity problem in monocular visual SLAM/odometry by proposing DistanceNet, a recurrent convolutional neural network that estimates traveled distance from image sequences, outperforming state-of-the-art methods on the KITTI dataset.
Classical monocular vSLAM/VO methods suffer from the scale ambiguity problem. Hybrid approaches solve this problem by adding deep learning methods, for example by using depth maps which are predicted by a CNN. We suggest that it is better to base scale estimation on estimating the traveled distance for a set of subsequent images. In this paper, we propose a novel end-to-end many-to-one traveled distance estimator. By using a deep recurrent convolutional neural network (RCNN), the traveled distance between the first and last image of a set of consecutive frames is estimated by our DistanceNet. Geometric features are learned in the CNN part of our model, which are subsequently used by the RNN to learn dynamics and temporal information. Moreover, we exploit the natural order of distances by using ordinal regression to predict the distance. The evaluation on the KITTI dataset shows that our approach outperforms current state-of-the-art deep learning pose estimators and classical mono vSLAM/VO methods in terms of distance prediction. Thus, our DistanceNet can be used as a component to solve the scale problem and help improve current and future classical mono vSLAM/VO methods.