CVSep 12, 2023

Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

arXiv:2309.06123v15.05 citationsh-index: 3

Originality Highly original

AI Analysis

This work addresses the need for more effective and parameter-efficient transfer learning in computer vision, offering a novel approach that improves performance for downstream visual tasks.

The paper tackles the problem of adapting large pre-trained models to visual tasks efficiently by proposing Dynamic Visual Prompt Tuning (DVPT), which generates instance-specific prompts to capture unique visual features, resulting in superior performance over other parameter-efficient methods and even outperforming full fine-tuning on 17 out of 19 tasks.

Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However, these methods do not take into account instance-specific visual clues for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning framework (DVPT), which can generate a dynamic instance-wise token for each image. In this way, it can capture the unique visual feature of each image, which can be more suitable for downstream visual tasks. We designed a Meta-Net module that can generate learnable prompts based on each image, thereby capturing dynamic instance-wise visual features. Extensive experiments on a wide range of downstream recognition tasks show that DVPT achieves superior performance than other PETL methods. More importantly, DVPT even outperforms full fine-tuning on 17 out of 19 downstream tasks while maintaining high parameter efficiency. Our code will be released soon.

View on arXiv PDF

Similar