LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
This work addresses efficiency and performance issues in vision model adaptation for researchers and practitioners, representing an incremental improvement over prior visual prompting techniques.
The paper tackles the limitations of existing visual prompting methods by proposing LoR-VP, a low-rank visual prompting design that improves interaction between prompts and images, resulting in up to 6 times faster training, 18 times fewer parameters, and a 3.1% performance gain.
Visual prompting has gained popularity as a method for adapting pre-trained models to specific tasks, particularly in the realm of parameter-efficient tuning. However, existing visual prompting techniques often pad the prompt parameters around the image, limiting the interaction between the visual prompts and the original image to a small set of patches while neglecting the inductive bias present in shared information across different patches. In this study, we conduct a thorough preliminary investigation to identify and address these limitations. We propose a novel visual prompt design, introducing Low-Rank matrix multiplication for Visual Prompting (LoR-VP), which enables shared and patch-specific information across rows and columns of image pixels. Extensive experiments across seven network architectures and four datasets demonstrate significant improvements in both performance and efficiency compared to state-of-the-art visual prompting methods, achieving up to 6 times faster training times, utilizing 18 times fewer visual prompt parameters, and delivering a 3.1% improvement in performance. The code is available as https://github.com/jincan333/LoR-VP.