CVMay 6, 2023

PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos

arXiv:2305.04075v129 citations
Originality Incremental advance
AI Analysis

This addresses the high labeling cost for point cloud videos, offering a self-supervised solution for researchers and practitioners in 3D vision, though it appears incremental as it builds on existing contrastive and masking techniques.

The paper tackles self-supervised learning on point cloud videos by proposing PointCMP, a contrastive mask prediction framework that uses a two-branch structure and feature-level hard sample generation, achieving state-of-the-art performance on benchmark datasets and outperforming full-supervised methods.

Self-supervised learning can extract representations of good quality from solely unlabeled data, which is appealing for point cloud videos due to their high labelling cost. In this paper, we propose a contrastive mask prediction (PointCMP) framework for self-supervised learning on point cloud videos. Specifically, our PointCMP employs a two-branch structure to achieve simultaneous learning of both local and global spatio-temporal information. On top of this two-branch structure, a mutual similarity based augmentation module is developed to synthesize hard samples at the feature level. By masking dominant tokens and erasing principal channels, we generate hard samples to facilitate learning representations with better discrimination and generalization performance. Extensive experiments show that our PointCMP achieves the state-of-the-art performance on benchmark datasets and outperforms existing full-supervised counterparts. Transfer learning results demonstrate the superiority of the learned representations across different datasets and tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes