CVMar 9, 2023

M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios

arXiv:2303.05122v32 citationsh-index: 70
Originality Incremental advance
AI Analysis

This addresses a critical issue in open-set recognition for vision-language models, enabling more reliable classification in real-world scenarios with unknown classes, though it is incremental as it builds on existing prompt tuning methods.

The paper tackles the problem of label bias in vision-language prompt learning for open-set recognition, where models incorrectly classify unknown classes as known ones, and proposes M-Tuning with a Combinatorial Tuning and Testing strategy to mitigate this bias, achieving state-of-the-art performance across datasets of various scales.

In realistic open-set scenarios where labels of a part of testing data are totally unknown, when vision-language (VL) prompt learning methods encounter inputs related to unknown classes (i.e., not seen during training), they always predict them as one of the training classes. The exhibited label bias causes difficulty in open set recognition (OSR), in which an image should be correctly predicted as one of the known classes or the unknown one. To achieve this goal, we propose a vision-language prompt tuning method with mitigated label bias (M-Tuning). It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario. Besides, inspired by the observation that classifying directly on large datasets causes a much higher false positive rate than on small datasets, we propose a Combinatorial Tuning and Testing (CTT) strategy for improving performance. CTT decomposes M-Tuning on large datasets as multiple independent group-wise tuning on fewer classes, then makes accurate and comprehensive predictions by selecting the optimal sub-prompt. Finally, given the lack of VL-based OSR baselines in the literature, especially for prompt methods, we contribute new baselines for fair comparisons. Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes