CVMay 11, 2024

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

arXiv:2405.06926v119 citationsh-index: 4Has CodeIJCAI
Originality Incremental advance
AI Analysis

This work addresses multi-label image classification for computer vision applications, presenting an incremental improvement over existing prompt tuning methods.

The paper tackles the problem of limited application scenarios in multi-label image classification by proposing a pseudo-visual prompt module and co-learning strategy, achieving state-of-the-art results on VOC2007, MS-COCO, and NUSWIDE datasets.

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt~(PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art~(SOTA) methods across various settings for multi-label image classification tasks. The code is available at https://github.com/njustkmg/PVP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes