CVApr 10, 2024

O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation

arXiv:2404.06836v211 citationsh-index: 9ECCV
Originality Incremental advance
AI Analysis

This work enables robots to build language-based scene maps in real-time, which is crucial for interactive applications, though it appears incremental relative to existing neural implicit mapping approaches.

The paper tackles the problem of online open-vocabulary mapping for robotic scene understanding by addressing challenges in local updates, semantic segmentation, and multi-view consistency, resulting in improved accuracy over previous state-of-the-art methods.

Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive mapping. However, implementing open-vocabulary scene understanding capability into online neural implicit mapping still faces three challenges: lack of local scene updating ability, blurry spatial hierarchical semantic segmentation and difficulty in maintaining multi-view consistency. To this end, we proposed O2V-mapping, which utilizes voxel-based language and geometric features to create an open-vocabulary field, thus allowing for local updates during online training process. Additionally, we leverage a foundational model for image segmentation to extract language features on object-level entities, achieving clear segmentation boundaries and hierarchical semantic features. For the purpose of preserving consistency in 3D object properties across different viewpoints, we propose a spatial adaptive voxel adjustment mechanism and a multi-view weight selection method. Extensive experiments on open-vocabulary object localization and semantic segmentation demonstrate that O2V-mapping achieves online construction of language scenes while enhancing accuracy, outperforming the previous SOTA method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes