CVNov 24, 2023

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

arXiv:2311.14405v1155 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the inefficiency of task-specific models for 3D point cloud segmentation, offering a single solution for multiple segmentation tasks, though it is incremental as it builds on existing transformer and kernel methods.

The paper tackles the problem of separate models for semantic, instance, and panoptic segmentation in 3D point clouds by proposing OneFormer3D, a unified transformer-based model that achieves state-of-the-art performance, including a +2.1 mAP50 improvement on ScanNet and +21 PQ on ScanNet.

Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes