CVJan 7, 2021

Self-Supervised Pretraining of 3D Features on any Point-Cloud

Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra

arXiv:2101.02691v135.9342 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the lack of large annotated datasets for 3D recognition tasks, providing a label-efficient pretraining solution for researchers and practitioners working with 3D data.

This paper introduces a self-supervised pretraining method for 3D features that works on diverse point-cloud data without requiring 3D registration. The method achieves state-of-the-art results on 9 benchmarks, including 69.0% mAP for object detection on ScanNet and 63.5% mAP on SUNRGBD, outperforming supervised pretraining.

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc. However, pretraining is not widely used for 3D recognition tasks where state-of-the-art methods train models from scratch. A primary reason is the lack of large annotated datasets because 3D data is both difficult to acquire and time consuming to label. We present a simple self-supervised pertaining method that can work with any 3D data - single or multiview, indoor or outdoor, acquired by varied sensors, without 3D registration. We pretrain standard point cloud and voxel based model architectures, and show that joint pretraining further improves performance. We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining. We set a new state-of-the-art for object detection on ScanNet (69.0% mAP) and SUNRGBD (63.5% mAP). Our pretrained models are label efficient and improve performance for classes with few examples.

View on arXiv PDF Code

Similar