CVNov 27, 2023

Text2Loc: 3D Point Cloud Localization from Natural Language

arXiv:2311.15977v268 citationsh-index: 13
Originality Highly original
AI Analysis

This addresses the problem of precise 3D localization for robotics or autonomous systems using natural language, representing a strong specific gain rather than a broad breakthrough.

The paper tackles 3D point cloud localization from natural language descriptions by introducing Text2Loc, a neural network that improves localization accuracy by up to 2× over state-of-the-art methods on the KITTI360Pose dataset.

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to $2\times$ over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at \url{https://yan-xia.github.io/projects/text2loc/}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes