CVAug 21, 2024

Positional Prompt Tuning for Efficient 3D Representation Learning

Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei

arXiv:2408.11567v215.313 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of reducing computational costs for fine-tuning 3D models in point cloud analysis, offering an incremental improvement over existing methods.

The paper tackles efficient 3D representation learning by proposing Positional Prompt Tuning (PPT), a parameter-efficient fine-tuning method that uses trainable positional encoding and patch tokens, achieving state-of-the-art results like 95.01% accuracy on ScanObjectNN OBJ_BG with only 1.05M trainable parameters.

We rethink the role of positional encoding in 3D representation learning and fine-tuning. We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds. Additionally, we explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters, introducing a straightforward yet effective method called PPT for point cloud analysis. PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen. Extensive experiments validate that PPT is both effective and efficient. Our proposed method of PEFT tasks, namely PPT, with only 1.05M of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes and weights will be released at https://github.com/zsc000722/PPT.

View on arXiv PDF Code

Similar