CVSep 12, 2023

Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

arXiv:2309.06123v15 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses the need for more effective and parameter-efficient transfer learning in computer vision, offering a novel approach that improves performance for downstream visual tasks.

The paper tackles the problem of adapting large pre-trained models to visual tasks efficiently by proposing Dynamic Visual Prompt Tuning (DVPT), which generates instance-specific prompts to capture unique visual features, resulting in superior performance over other parameter-efficient methods and even outperforming full fine-tuning on 17 out of 19 tasks.

Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However, these methods do not take into account instance-specific visual clues for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning framework (DVPT), which can generate a dynamic instance-wise token for each image. In this way, it can capture the unique visual feature of each image, which can be more suitable for downstream visual tasks. We designed a Meta-Net module that can generate learnable prompts based on each image, thereby capturing dynamic instance-wise visual features. Extensive experiments on a wide range of downstream recognition tasks show that DVPT achieves superior performance than other PETL methods. More importantly, DVPT even outperforms full fine-tuning on 17 out of 19 downstream tasks while maintaining high parameter efficiency. Our code will be released soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes