CVApr 12, 2021

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

arXiv:2104.05327v289 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses place recognition for robotics and autonomous vehicles, offering an incremental improvement by combining existing modalities with a novel training mitigation strategy.

The authors tackled the problem of place recognition in robotics by introducing MinkLoc++, a multimodal descriptor that fuses LiDAR point clouds and RGB images using late fusion, achieving state-of-the-art performance on standard benchmarks. They also identified and mitigated the dominating modality problem in training, where one modality overfits and reduces evaluation performance.

We introduce a discriminative multimodal descriptor based on a pair of sensor readings: a point cloud from a LiDAR and an image from an RGB camera. Our descriptor, named MinkLoc++, can be used for place recognition, re-localization and loop closure purposes in robotics or autonomous vehicles applications. We use late fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline. The proposed method achieves state-of-the-art performance on standard place recognition benchmarks. We also identify dominating modality problem when training a multimodal descriptor. The problem manifests itself when the network focuses on a modality with a larger overfit to the training data. This drives the loss down during the training but leads to suboptimal performance on the evaluation set. In this work we describe how to detect and mitigate such risk when using a deep metric learning approach to train a multimodal neural network. Our code is publicly available on the project website: https://github.com/jac99/MinkLocMultimodal.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes