CVMay 1

Leveraging Vision-Language Models as Weak Annotators in Active Learning

arXiv:2605.004809.4h-index: 4
AI Analysis

For practitioners in fine-grained recognition, this work reduces reliance on costly human annotation by leveraging VLMs as weak annotators, achieving better performance under limited labeling budgets.

Active learning reduces annotation cost by selectively querying informative samples. This work proposes a framework that combines fine-grained human annotations with coarse-grained VLM-generated weak labels, outperforming existing methods on CUB200 and FGVC-Aircraft under the same annotation budget.

Active learning aims to reduce annotation cost by selectively querying informative samples for supervision under a limited labeling budget. In this work, we investigate how vision-language models (VLMs) can be leveraged to further reduce the reliance on costly human annotation within the active learning paradigm. To this end, we find that the reliability of VLMs varies significantly with label granularity in fine-grained recognition tasks: they perform poorly on fine-grained labels but can provide accurate coarse-grained labels. Leveraging this property, we propose an active learning framework that combines fine-grained human annotations with coarse-grained VLM-generated weak labels through instance-wise label assignment. We further model the systematic noise in VLM-generated labels using a small set of trusted full labels. Experiments on CUB200 and FGVC-Aircraft show that the proposed framework consistently outperforms existing active learning methods under the same annotation budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes