CVAIJul 27, 2023

Clustering based Point Cloud Representation Learning for 3D Analysis

arXiv:2307.14605v153 citationsh-index: 77
Originality Incremental advance
AI Analysis

This work addresses the challenge of learning discriminative and robust representations for point cloud analysis, which is crucial for applications like autonomous driving and robotics, but it is incremental as it builds on existing network architectures.

The paper tackles the problem of learning an appropriate point embedding space for 3D point cloud analysis by proposing a clustering-based supervised learning scheme that discovers subclass patterns to improve robustness to variations. It achieves notable improvements, such as 2.0-2.6% mIoU gains on SemanticKITTI and 2.0-3.4% mAP gains on KITTI for segmentation and detection tasks.

Point cloud analysis (such as 3D segmentation and detection) is a challenging task, because of not only the irregular geometries of many millions of unordered points, but also the great variations caused by depth, viewpoint, occlusion, etc. Current studies put much focus on the adaption of neural networks to the complex geometries of point clouds, but are blind to a fundamental question: how to learn an appropriate point embedding space that is aware of both discriminative semantics and challenging variations? As a response, we propose a clustering based supervised learning scheme for point cloud analysis. Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space for automatically discovering subclass patterns which are latent yet representative across scenes. The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations. Our algorithm is principled and readily pluggable to modern point cloud segmentation networks during training, without extra overhead during testing. With various 3D network architectures (i.e., voxel-based, point-based, Transformer-based, automatically searched), our algorithm shows notable improvements on famous point cloud segmentation datasets (i.e.,2.0-2.6% on single-scan and 2.0-2.2% multi-scan of SemanticKITTI, 1.8-1.9% on S3DIS, in terms of mIoU). Our algorithm also demonstrates utility in 3D detection, showing 2.0-3.4% mAP gains on KITTI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes