Momo: Monocular Motion Estimation on Manifolds
This work addresses the need for reliable pose estimation in autonomous driving, particularly in challenging environments, though it appears incremental by building on existing visual odometry with motion model integration.
The paper tackles the problem of obtaining a high-quality visual odometry prior for autonomous vehicle localization by introducing Momo, a monocular frame-to-frame motion estimation method that incorporates vehicle motion models. The result shows that Momo outperforms state-of-the-art methods in low-structure environments, achieving high accuracy with only 100-300 feature matches and real-time performance on datasets like KITTI.
Knowledge about the location of a vehicle is indispensable for autonomous driving. In order to apply global localisation methods, a pose prior must be known which can be obtained from visual odometry. The quality and robustness of that prior determine the success of localisation. Momo is a monocular frame-to-frame motion estimation methodology providing a high quality visual odometry for that purpose. By taking into account the motion model of the vehicle, reliability and accuracy of the pose prior are significantly improved. We show that especially in low-structure environments Momo outperforms the state of the art. Moreover, the method is designed so that multiple cameras with or without overlap can be integrated. The evaluation on the KITTI-dataset and on a proper multi-camera dataset shows that even with only 100--300 feature matches the prior is estimated with high accuracy and in real-time.