CV MMSep 21, 2024

CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise

Fuyang Yu, Runze Tian, Zhen Wang, Xiaochuan Wang, Xiaohui Liang

arXiv:2409.13982v12.0

Originality Incremental advance

AI Analysis

This work addresses the difficulty of acquiring annotation labels in 3D data for researchers and practitioners in computer vision, but it appears incremental as it builds on existing CLIP-based methods by focusing on noise reduction.

The paper tackles the problem of noise in feature projection from 2D to 3D for unsupervised 3D segmentation by proposing CUS3D, a distillation learning framework with an object-level denoising module, resulting in advanced unsupervised and open-vocabulary segmentation performance.

To ease the difficulty of acquiring annotation labels in 3D data, a common method is using unsupervised and open-vocabulary semantic segmentation, which leverage 2D CLIP semantic knowledge. In this paper, unlike previous research that ignores the ``noise'' raised during feature projection from 2D to 3D, we propose a novel distillation learning framework named CUS3D. In our approach, an object-level denosing projection module is designed to screen out the ``noise'' and ensure more accurate 3D feature. Based on the obtained features, a multimodal distillation learning module is designed to align the 3D feature with CLIP semantic feature space with object-centered constrains to achieve advanced unsupervised semantic segmentation. We conduct comprehensive experiments in both unsupervised and open-vocabulary segmentation, and the results consistently showcase the superiority of our model in achieving advanced unsupervised segmentation results and its effectiveness in open-vocabulary segmentation.

View on arXiv PDF

Similar