PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds
This work addresses the problem of self-supervised learning for point cloud understanding, which is important for applications like 3D vision and robotics, but it appears incremental as it builds on existing masked auto-encoder and geometric descriptor techniques.
The authors tackled the challenge of learning discriminative and transferable features from irregular and sparse point clouds by proposing PointGame, a geometrically and adaptively masked auto-encoder, which demonstrated clear advantages over competitors on various downstream tasks.
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity and sparsity. We propose a geometrically and adaptively masked auto-encoder for self-supervised learning on point clouds, termed \textit{PointGame}. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module; it not only absorbs the conventional wisdom of geometric descriptors that captures the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based Transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pre-trained models will be publicly available.