CVJan 23, 2024

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

arXiv:2401.12902v143 citationsh-index: 18ICLR
Originality Synthesis-oriented
AI Analysis

This work provides insights for researchers and practitioners in computer vision on optimizing parameter-efficient transfer learning, though it is incremental as it clarifies existing methods rather than introducing new ones.

The paper tackles the problem of understanding when and why Visual Prompt Tuning (VPT) outperforms full finetuning for vision models, finding that VPT is preferable when there is a substantial disparity in task objectives or similarity in data distributions across 19 datasets and tasks.

As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning. However, the conditions favoring VPT (the ``when") and the underlying rationale (the ``why") remain unclear. In this paper, we conduct a comprehensive analysis across 19 distinct datasets and tasks. To understand the ``when" aspect, we identify the scenarios where VPT proves favorable by two dimensions: task objectives and data distributions. We find that VPT is preferrable when there is 1) a substantial disparity between the original and the downstream task objectives (e.g., transitioning from classification to counting), or 2) a similarity in data distributions between the two tasks (e.g., both involve natural images). In exploring the ``why" dimension, our results indicate VPT's success cannot be attributed solely to overfitting and optimization considerations. The unique way VPT preserves original features and adds parameters appears to be a pivotal factor. Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes