CVRONov 22, 2024

Open-Vocabulary Online Semantic Mapping for SLAM

arXiv:2411.15043v313 citationsh-index: 4IEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate real-time semantic mapping in robotics and autonomous systems, representing an incremental improvement with novel method integration.

The paper tackles the problem of creating open-vocabulary online 3D semantic maps for SLAM by detecting and tracking segments using CLIP vectors with a novel merging method, resulting in significantly lower computational and memory footprint than offline baselines and better segmentation metrics than both offline and online ones.

This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes