CVJul 12, 2023

OG: Equip vision occupancy with instance segmentation and visual grounding

arXiv:2307.05873v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the problem of enhancing 3D scene understanding for autonomous driving or robotics by providing incremental improvements to existing occupancy prediction tasks.

The paper tackles the limitations of occupancy prediction by introducing a method that adds instance segmentation and visual grounding capabilities, enabling the distinction of different instances and voxel-level grounding, with code to be released.

Occupancy prediction tasks focus on the inference of both geometry and semantic labels for each voxel, which is an important perception mission. However, it is still a semantic segmentation task without distinguishing various instances. Further, although some existing works, such as Open-Vocabulary Occupancy (OVO), have already solved the problem of open vocabulary detection, visual grounding in occupancy has not been solved to the best of our knowledge. To tackle the above two limitations, this paper proposes Occupancy Grounding (OG), a novel method that equips vanilla occupancy instance segmentation ability and could operate visual grounding in a voxel manner with the help of grounded-SAM. Keys to our approach are (1) affinity field prediction for instance clustering and (2) association strategy for aligning 2D instance masks and 3D occupancy instances. Extensive experiments have been conducted whose visualization results and analysis are shown below. Our code will be publicly released soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes