CVDec 12, 2024

Advancing Textual Prompt Learning with Anchored Attributes

arXiv:2412.09442v420 citationsh-index: 10Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific problem in vision-language models for researchers and practitioners by enabling alignment with unknown categories, representing an incremental advancement through a plug-in technique.

The paper tackles the limitation of existing textual prompt learning methods that cannot align images with unknown categories by introducing an Attribute-anchored Textual Prompt learning method (ATPrompt), which uses universal attributes as a bridge and achieves general improvements across 11 datasets at negligible computational cost.

Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-anchored Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. Additionally, we introduce a straightforward differentiable attribute search method to identify representative and suitable attributes for downstream tasks. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing basic prompt format in textual-based methods, providing general improvements at a negligible computational cost. Extensive experiments across 11 datasets validate the effectiveness of our method. Code is publicly available at https://github.com/zhengli97/ATPrompt.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes