Unifying Local and Global Multimodal Features for Place Recognition in Aliased and Low-Texture Environments
This work addresses the problem of improving SLAM reliability for robotics in aliased and low-texture environments, representing an incremental advancement with domain-specific impact.
The paper tackles place recognition in challenging environments with perceptual aliasing and weak textures by proposing UMF, a model that unifies local and global multimodal features using cross-attention between vision and LiDAR, and it significantly outperforms previous baselines in planetary-analogous environments.
Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at https://github.com/DLR-RM/UMF