CVMar 13, 2024

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

Yuxin Tian, Mouxing Yang, Yunfan Li, Dayiheng Liu, Xingzhang Ren, Xi Peng, Jiancheng Lv

arXiv:2403.08433v23.71 citationsh-index: 38ICME

Originality Incremental advance

AI Analysis

This provides guidance for selecting training strategies in vision-language fine-tuning, though it is incremental as it builds on existing PEFT methods.

The study investigated how data size and fine-tunable parameter size affect the performance of Parameter Efficient Fine-Tuning (PEFT) techniques on vision-language models, finding that these factors only matter when downstream tasks are inconsistent with pre-training, with data size having no effect and parameter size showing non-monotonic influence in consistent cases.

Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.

View on arXiv PDF

Similar