Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry
This work addresses visual odometry for robotics and autonomous systems, offering a novel framework that improves performance in difficult conditions, though it builds incrementally on existing learning-based methods.
The authors tackled the problem of visual odometry by moving beyond pure tracking to incorporate memory and refining components, resulting in a method that outperforms state-of-the-art learning-based approaches by a large margin and achieves competitive results against classic monocular methods, particularly excelling in challenging scenarios like texture-less regions and abrupt motions.
Most previous learning-based visual odometry (VO) methods take VO as a pure tracking problem. In contrast, we present a VO framework by incorporating two additional components called Memory and Refining. The Memory component preserves global information by employing an adaptive and efficient selection strategy. The Refining component ameliorates previous results with the contexts stored in the Memory by adopting a spatial-temporal attention mechanism for feature distilling. Experiments on the KITTI and TUM-RGBD benchmark datasets demonstrate that our method outperforms state-of-the-art learning-based methods by a large margin and produces competitive results against classic monocular VO approaches. Especially, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic VO algorithms tend to fail.