Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
This work addresses efficiency and modality limitations in point cloud self-supervised learning, representing an incremental improvement over existing methods.
The paper tackles the problem of lengthy pre-training time and the need for reconstruction or additional modalities in self-supervised learning for point clouds by introducing Point-JEPA, a joint embedding predictive architecture that achieves competitive results with state-of-the-art methods.
Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud patch embeddings to efficiently compute and utilize their proximity based on the indices during target and context selection. The sequencer also allows shared computations of the patch embeddings' proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.