CVNov 25, 2022

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

Tsinghua
arXiv:2211.14091v28 citationsh-index: 30Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing 3D scene understanding for applications like object detection and multimodal tasks, but it is incremental as it builds on existing methods by adding language assistance.

The paper tackles the problem of learning descriptive 3D features for semantic scene understanding by using textual descriptions to guide feature learning toward important geometric attributes and scene context, resulting in improved performance on 3D-only and 3D-language tasks, especially in label-deficient regimes.

Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding the object relationships and object attributes. We then inject the knowledge to 3D feature learning through three classification-based auxiliary tasks. This language-assisted training can be combined with modern object detection and instance segmentation methods to promote 3D semantic scene understanding, especially in a label-deficient regime. Moreover, the 3D feature learned with language assistance is better aligned with the language features, which can benefit various 3D-language multimodal tasks. Experiments on several benchmarks of 3D-only and 3D-language tasks demonstrate the effectiveness of our language-assisted 3D feature learning. Code is available at https://github.com/Asterisci/Language-Assisted-3D.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes