CORAL: Colored structural representation for bi-modal place recognition
This work addresses the problem of robust place recognition for drift-free localization systems, which is crucial for autonomous navigation in varying environments.
The paper proposes a bi-modal place recognition method, CORAL-VLAD, that combines vision and LiDAR data to create a compound global descriptor. It achieves superior performance compared to state-of-the-art methods on the Oxford RobotCar dataset and demonstrates generalizability across different scenes and sensor configurations.
Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL. And the whole network is called CORAL-VLAD. Comparisons on the Oxford RobotCar show that CORAL-VLAD has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations on cross-city datasets.