Label-Efficient Grasp Joint Prediction with Point-JEPA
This work addresses data-efficient grasp learning for robotics, but it is incremental as it applies an existing JEPA-style method to a specific domain.
The paper tackled the problem of label-efficient grasp joint-angle prediction by using 3D self-supervised pretraining with Point-JEPA, resulting in a 26% lower RMSE at 25% data and reaching parity at full supervision.
We study whether 3D self-supervised pretraining with Point--JEPA enables label-efficient grasp joint-angle prediction. Meshes are sampled to point clouds and tokenized; a ShapeNet-pretrained Point--JEPA encoder feeds a $K{=}5$ multi-hypothesis head trained with winner-takes-all and evaluated by top--logit selection. On a multi-finger hand dataset with strict object-level splits, Point--JEPA improves top--logit RMSE and Coverage@15$^{\circ}$ in low-label regimes (e.g., 26% lower RMSE at 25% data) and reaches parity at full supervision, suggesting JEPA-style pretraining is a practical lever for data-efficient grasp learning.