PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning
This addresses the challenge of data scarcity in 3D perception for applications like robotics and autonomous driving, though it is incremental as it adapts existing RL methods to a new domain.
The paper tackled the problem of fine-tuning 3D point cloud models with limited data by proposing PointRFT, a reinforcement fine-tuning paradigm, which outperformed supervised fine-tuning and achieved state-of-the-art performance in few-shot classification benchmarks.
Understanding spatial dynamics and semantics in point cloud is fundamental for comprehensive 3D comprehension. While reinforcement learning algorithms such as Group Relative Policy Optimization (GRPO) have recently achieved remarkable breakthroughs in large language models by incentivizing reasoning capabilities through strategic reward design, their potential remains largely unexplored in the 3D perception domain. This naturally raises a pivotal question: Can RL-based methods effectively empower 3D point cloud fine-tuning? In this paper, we propose PointRFT, the first reinforcement fine-tuning paradigm tailored specifically for point cloud representation learning. We select three prevalent 3D foundation models and devise specialized accuracy reward and dispersion reward functions to stabilize training and mitigate distribution shifts. Through comprehensive few-shot classification experiments comparing distinct training paradigms, we demonstrate that PointRFT consistently outperforms vanilla supervised fine-tuning (SFT) across diverse benchmarks. Furthermore, when organically integrated into a hybrid Pretraining-SFT-RFT paradigm, the representational capacity of point cloud foundation models is substantially unleashed, achieving state-of-the-art performance particularly under data-scarce scenarios.