CVMar 3, 2025

Diversity Covariance-Aware Prompt Learning for Vision-Language Models

arXiv:2503.01531v11 citationsh-index: 14Pattern Recognition
Originality Incremental advance
AI Analysis

This work addresses the challenge of adapting vision-language models to specific applications through better prompt tuning, representing an incremental improvement over existing methods.

The paper tackles the problem of improving few-shot learning performance in vision-language models by developing a Diversity Covariance-Aware prompt learning framework that models covariance relationships between visual features and learns multiple diverse soft prompts. The method achieves state-of-the-art results across 11 datasets in various tasks.

Prompt tuning can further enhance the performance of visual-language models across various downstream tasks (e.g., few-shot learning), enabling them to better adapt to specific applications and needs. In this paper, we present a Diversity Covariance-Aware framework that learns distributional information from the data to enhance the few-shot ability of the prompt model. First, we propose a covariance-aware method that models the covariance relationships between visual features and uses anisotropic Mahalanobis distance, instead of the suboptimal cosine distance, to measure the similarity between two modalities. We rigorously derive and prove the validity of this modeling process. Then, we propose the diversity-aware method, which learns multiple diverse soft prompts to capture different attributes of categories and aligns them independently with visual modalities. This method achieves multi-centered covariance modeling, leading to more diverse decision boundaries. Extensive experiments on 11 datasets in various tasks demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes