CVMar 13, 2024

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

arXiv:2403.08433v21 citationsh-index: 18ICME
AI Analysis

This provides guidance for selecting training strategies in vision-language fine-tuning, though it is incremental as it builds on existing PEFT methods.

The study investigated how data size and fine-tunable parameter size affect the performance of Parameter Efficient Fine-Tuning (PEFT) techniques on vision-language models, finding that these factors only matter when downstream tasks are inconsistent with pre-training, with data size having no effect and parameter size showing non-monotonic influence in consistent cases.

Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes