CVMar 12, 2024

MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions

arXiv:2403.07593v212 citationsh-index: 3Array
AI Analysis

It addresses place recognition for robotics and autonomous systems, offering an efficient alternative to complex methods like Transformers, though it is incremental as it builds on existing 3D convolution techniques.

This paper tackles the problem of large-scale place recognition from point clouds by proposing MinkUNeXt, an architecture based on 3D sparse convolutions, which outperforms current state-of-the-art methods on datasets like Oxford RobotCar and In-house.

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder-decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar and the In-house datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes