CVSep 11, 2025

S-BEVLoc: BEV-based Self-supervised Framework for Large-scale LiDAR Global Localization

arXiv:2509.09110v1h-index: 7IEEE Robot Autom Lett
Originality Highly original
AI Analysis

This addresses the high cost of pose acquisition for SLAM systems, offering a scalable solution for autonomous vehicles and robotics.

The paper tackles the problem of LiDAR-based global localization without ground-truth poses by proposing S-BEVLoc, a self-supervised framework using bird's-eye view images, which achieves state-of-the-art performance on KITTI and NCLT datasets.

LiDAR-based global localization is an essential component of simultaneous localization and mapping (SLAM), which helps loop closure and re-localization. Current approaches rely on ground-truth poses obtained from GPS or SLAM odometry to supervise network training. Despite the great success of these supervised approaches, substantial cost and effort are required for high-precision ground-truth pose acquisition. In this work, we propose S-BEVLoc, a novel self-supervised framework based on bird's-eye view (BEV) for LiDAR global localization, which eliminates the need for ground-truth poses and is highly scalable. We construct training triplets from single BEV images by leveraging the known geographic distances between keypoint-centered BEV patches. Convolutional neural network (CNN) is used to extract local features, and NetVLAD is employed to aggregate global descriptors. Moreover, we introduce SoftCos loss to enhance learning from the generated triplets. Experimental results on the large-scale KITTI and NCLT datasets show that S-BEVLoc achieves state-of-the-art performance in place recognition, loop closure, and global localization tasks, while offering scalability that would require extra effort for supervised approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes