EMA-VIO: Deep Visual-Inertial Odometry with External Memory Attention
This addresses localization challenges for mobile agents, particularly in difficult conditions like overcast days and water-filled ground, representing an incremental improvement over prior learning-based methods.
The paper tackles the problem of accurate and robust localization for mobile agents by proposing a deep learning-based visual-inertial odometry framework with external memory attention, which outperforms existing traditional and learning-based baselines in various challenging scenarios.
Accurate and robust localization is a fundamental need for mobile agents. Visual-inertial odometry (VIO) algorithms exploit the information from camera and inertial sensors to estimate position and translation. Recent deep learning based VIO models attract attentions as they provide pose information in a data-driven way, without the need of designing hand-crafted algorithms. Existing learning based VIO models rely on recurrent models to fuse multimodal data and process sensor signal, which are hard to train and not efficient enough. We propose a novel learning based VIO framework with external memory attention that effectively and efficiently combines visual and inertial features for states estimation. Our proposed model is able to estimate pose accurately and robustly, even in challenging scenarios, e.g., on overcast days and water-filled ground , which are difficult for traditional VIO algorithms to extract visual features. Experiments validate that it outperforms both traditional and learning based VIO baselines in different scenes.