CVAILGJun 7, 2023

Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

arXiv:2306.04633v270 citationsh-index: 188
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D instance segmentation for robotics and AR/VR applications by providing a scalable method without needing object tracking or object count limits, though it is incremental as it builds on 2D pre-trained models.

The paper tackles 3D instance segmentation by lifting 2D segments to 3D using a neural field representation with a slow-fast contrastive fusion method, achieving state-of-the-art performance on datasets like ScanNet and a new Messy Rooms dataset with up to 500 objects per scene.

Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation, which encourages multi-view consistency across frames. The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects. Unlike previous approaches, our method does not require an upper bound on the number of objects or object tracking across frames. To demonstrate the scalability of the slow-fast clustering, we create a new semi-realistic dataset called the Messy Rooms dataset, which features scenes with up to 500 objects per scene. Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets, as well as on our newly created Messy Rooms dataset, demonstrating the effectiveness and scalability of our slow-fast clustering method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes