CVJun 20, 2022

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

arXiv:2206.09541v1177 citationsh-index: 83
Originality Incremental advance
AI Analysis

This addresses the problem of multi-label recognition in low-label regimes for applications like image tagging, though it is incremental as it builds on existing vision-language alignment methods.

The paper tackles multi-label image recognition with limited annotations by proposing DualCoOp, a framework that uses pretrained vision-language models and introduces light learnable prompts for positive and negative contexts, achieving improved accuracy over state-of-the-art methods on standard benchmarks.

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications. Recent work learns an alignment between textual and visual spaces to compensate for insufficient image labels, but loses accuracy because of the limited amount of available MLR annotations. In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes