CVJan 29, 2023

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

arXiv:2301.12429v336 citationsh-index: 30
Originality Highly original
AI Analysis

This addresses the issue of data bias in fine-tuning for researchers and practitioners using vision-language models, though it is an incremental improvement over existing regularization techniques.

The paper tackles the problem of fine-tuning vision-language models overfitting to biased downstream task data by introducing Prompt Regularization (ProReg), which uses prompt-based predictions from the pretrained model as a regularizer, resulting in consistently strong performance on out-of-distribution benchmarks compared to existing methods.

We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. The motivation is: by prompting the large model "a photo of a [CLASS]", the fil-lin answer is only dependent on the pretraining encyclopedic knowledge while independent of the task data distribution, which is usually biased. Specifically, given a training sample prediction during fine-tuning, we first calculate its KullbackLeibler loss of the prompt prediction and Cross-Entropy loss of the ground-truth label, and then combine them with a proposed sample-wise adaptive trade-off weight, which automatically adjusts the transfer between the pretrained and downstream domains. On various out-of-distribution benchmarks, we show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes