CVMar 28, 2024

Efficient 3D Instance Mapping and Localization with Neural Fields

MIT
arXiv:2403.19797v55 citationsh-index: 25ICRA
Originality Incremental advance
AI Analysis

This addresses the need for faster and more effective 3D scene understanding in computer vision, though it appears incremental as it builds on prior neural field approaches.

The paper tackles the problem of learning implicit scene representations for 3D instance segmentation from posed RGB images, introducing 3DIML, which achieves a large practical speedup over existing methods with comparable quality.

We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a neural label field which can render 3D instance segmentation masks from novel viewpoints. Opposed to prior art that optimizes a neural field in a self-supervised manner, requiring complicated training procedures and loss function design, 3DIML leverages a two-phase process. The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels. These almost 3D-consistent pseudolabel masks are then used in the second phase, InstanceLift, to supervise the training of a neural label field, which interpolates regions missed by InstanceMap and resolves ambiguities. Additionally, we introduce InstanceLoc, which enables near realtime localization of instance masks given a trained neural label field. We evaluate 3DIML on sequences from the Replica and ScanNet datasets and demonstrate its effectiveness under mild assumptions for the image sequences. We achieve a large practical speedup over existing implicit scene representation methods with comparable quality, showcasing its potential to facilitate faster and more effective 3D scene understanding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes