CVAIAug 27, 2024

CVPT: Cross Visual Prompt Tuning

arXiv:2408.14961v23 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient fine-tuning for large-scale visual models, offering a competitive prompt-based alternative, though it is incremental as it builds on existing VPT methods.

The paper tackled the performance and efficiency limitations of Visual Prompt Tuning (VPT) in computer vision by proposing Cross Visual Prompt Tuning (CVPT), which uses a cross-attention module to preserve self-attention integrity and achieves over 4% higher average accuracy on the VTAB-1K benchmark, rivaling adapter-based methods.

Parameter-Efficient Fine-Tuning (PEFT) has emerged to mitigate the computational demands of large-scale models. Within computer vision, adapter-based PEFT methods are often favored over prompt-based approaches like Visual Prompt Tuning (VPT) due to the latter's performance and efficiency limitations. Our analysis reveals that VPT's shortcomings stem from its prompt deployment strategy, which can distort the model's inherent self-attention mechanism. To address this, we propose Cross Visual Prompt Tuning (CVPT). CVPT introduces a cross-attention module to directly model interactions between prompts and image tokens. This design decouples the prompts from the input sequence, preserving the original self-attention integrity while enabling efficient feature integration. Furthermore, we employ a weight-sharing mechanism for cross-attention initialization, which enhances representative capability without a large parameter overhead. Extensive experiments across 25 datasets show that CVPT significantly outperforms VPT. For instance, on the VTAB-1K benchmark, CVPT achieves over 4% higher average accuracy, rivaling leading adapter-based methods in both performance and efficiency. Our work confirms that prompt-based methods can achieve exceptional results in visual fine-tuning. The code is available at https://github.com/Lingyun0419/CVPT

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes