CVNov 23, 2024

Active Prompt Learning with Vision-Language Model Priors

arXiv:2411.16722v12 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient adaptation of vision-language models for researchers and practitioners, offering an incremental improvement in active learning strategies.

The paper tackles the inefficiency of adapting vision-language models to new tasks by proposing an active prompt learning framework that uses class-guided clustering and selective querying to reduce reliance on labeled data, achieving higher accuracy with fewer labels across nine datasets.

Vision-language models (VLMs) have demonstrated remarkable zero-shot performance across various classification tasks. Nonetheless, their reliance on hand-crafted text prompts for each task hinders efficient adaptation to new tasks. While prompt learning offers a promising solution, most studies focus on maximizing the utilization of given few-shot labeled datasets, often overlooking the potential of careful data selection strategies, which enable higher accuracy with fewer labeled data. This motivates us to study a budget-efficient active prompt learning framework. Specifically, we introduce a class-guided clustering that leverages the pre-trained image and text encoders of VLMs, thereby enabling our cluster-balanced acquisition function from the initial round of active learning. Furthermore, considering the substantial class-wise variance in confidence exhibited by VLMs, we propose a budget-saving selective querying based on adaptive class-wise thresholds. Extensive experiments in active learning scenarios across nine datasets demonstrate that our method outperforms existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes