TartanVO: A Generalizable Learning-based VO
This addresses the need for robust VO systems in robotics and autonomous vehicles, offering a generalizable solution that is incremental by building on existing learning-based approaches.
The authors tackled the problem of visual odometry (VO) by developing TartanVO, a learning-based model that generalizes to multiple real-world datasets like KITTI and EuRoC without finetuning, outperforming geometry-based methods in challenging scenes.
We present the first learning-based visual odometry (VO) model, which generalizes to multiple datasets and real-world scenarios and outperforms geometry-based methods in challenging scenes. We achieve this by leveraging the SLAM dataset TartanAir, which provides a large amount of diverse synthetic data in challenging environments. Furthermore, to make our VO model generalize across datasets, we propose an up-to-scale loss function and incorporate the camera intrinsic parameters into the model. Experiments show that a single model, TartanVO, trained only on synthetic data, without any finetuning, can be generalized to real-world datasets such as KITTI and EuRoC, demonstrating significant advantages over the geometry-based methods on challenging trajectories. Our code is available at https://github.com/castacks/tartanvo.