CVRODec 6, 2022

Attention-Enhanced Cross-modal Localization Between 360 Images and Point Clouds

arXiv:2212.02757v15 citationsh-index: 55
Originality Incremental advance
AI Analysis

This addresses localization for intelligent robots and autonomous driving when GNSS is unreliable, offering a low-cost, robust solution, but it is incremental as it builds on existing cross-modal methods.

The paper tackles visual localization by correlating 360 equirectangular images to point clouds using an end-to-end learnable network with attention mechanisms, demonstrating effectiveness through experiments on sequences based on the KITTI-360 dataset.

Visual localization plays an important role for intelligent robots and autonomous driving, especially when the accuracy of GNSS is unreliable. Recently, camera localization in LiDAR maps has attracted more and more attention for its low cost and potential robustness to illumination and weather changes. However, the commonly used pinhole camera has a narrow Field-of-View, thus leading to limited information compared with the omni-directional LiDAR data. To overcome this limitation, we focus on correlating the information of 360 equirectangular images to point clouds, proposing an end-to-end learnable network to conduct cross-modal visual localization by establishing similarity in high-dimensional feature space. Inspired by the attention mechanism, we optimize the network to capture the salient feature for comparing images and point clouds. We construct several sequences containing 360 equirectangular images and corresponding point clouds based on the KITTI-360 dataset and conduct extensive experiments. The results demonstrate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes