U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization
This work addresses relocalization for intelligent vehicles when GPS or sensors fail, offering incremental improvements in BEV segmentation and neural map-based methods.
The paper tackles the problem of efficient relocalization for intelligent vehicles by introducing U-BEV, a U-Net inspired architecture that improves Bird's-Eye-View segmentation by reasoning about multiple height layers, boosting performance by up to 4.11 IoU, and combining it with a differentiable template matcher to outperform transformer-based methods by 1.7 to 2.8 mIoU and achieve over 26% higher Recall Accuracy on the nuScenes dataset.
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.