CVJun 28, 2023

Understanding Prompt Tuning for V-L Models Through the Lens of Neural Collapse

Tsinghua
arXiv:2306.15955v33 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing generalization in vision-language models for downstream tasks, particularly under class imbalance, by leveraging neural collapse insights, representing an incremental advance in prompt tuning techniques.

The paper tackled the problem of understanding and improving prompt tuning in vision-language models by analyzing text-to-image representations through neural collapse, finding that optimality correlates with downstream performance, especially under class imbalance. They proposed Neural-collapse-anchored Prompt Tuning (NPT), which improved existing methods across 11 datasets in balanced and imbalanced settings.

Large-scale vision-language (V-L) models have demonstrated remarkable generalization capabilities for downstream tasks through prompt tuning. However, the mechanisms behind the learned text representations are unknown, limiting further generalization gains, especially under class imbalance scenarios. Recent advances in the neural collapse (NC) phenomenon of vision-only models suggest that the optimal representation structure is the simplex ETF, which paves the way to study representations in V-L models. In this paper, we make the first attempt to use NC for examining the representations in V-L models via prompt tuning. It is found that NC optimality of text-to-image representations shows a positive correlation with downstream generalizability, which is more severe under class imbalance settings. To improve the representations, we propose Neural-collapse-anchored Prompt Tuning (NPT), a novel method that learns prompts with text and image representations that satisfy the same simplex ETF. NPT incorporates two regularization terms: language-modality collapse and multi-modality isomorphism; and it is compatible with other prompt tuning methods. Extensive experiments show that NPT can consistently help to improve existing prompt tuning techniques across 11 datasets for both balanced and imbalanced settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes