Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning
This addresses the problem of limited labeled data for 3D point cloud representation learning in computer vision, offering a novel method that improves performance over prior approaches.
The paper tackles the challenge of learning semantically meaningful representations from unstructured 3D point clouds without large labeled datasets by proposing AsymDSD, an asymmetric dual self-distillation framework that unifies masked modeling and invariance learning, achieving state-of-the-art results of 90.53% on ScanObjectNN and 93.72% with pretraining on 930k shapes.
Learning semantically meaningful representations from unstructured 3D point clouds remains a central challenge in computer vision, especially in the absence of large-scale labeled datasets. While masked point modeling (MPM) is widely used in self-supervised 3D learning, its reconstruction-based objective can limit its ability to capture high-level semantics. We propose AsymDSD, an Asymmetric Dual Self-Distillation framework that unifies masked modeling and invariance learning through prediction in the latent space rather than the input space. AsymDSD builds on a joint embedding architecture and introduces several key design choices: an efficient asymmetric setup, disabling attention between masked queries to prevent shape leakage, multi-mask sampling, and a point cloud adaptation of multi-crop. AsymDSD achieves state-of-the-art results on ScanObjectNN (90.53%) and further improves to 93.72% when pretrained on 930k shapes, surpassing prior methods.