PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning
This work addresses the problem of segmenting object parts with minimal labeled data for computer vision applications, representing an incremental advance in multimodal learning.
The paper tackles few-shot part segmentation by developing PartSeg, a method using part-aware prompt learning with CLIP to generate part-specific prompts and establish relationships between parts across categories, achieving state-of-the-art performance on PartImageNet and Pascal_Part datasets.
In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It is found that leveraging the textual space of a powerful pre-trained image-language model (such as CLIP) can be beneficial in learning visual features. Therefore, we develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning. Specifically, we design a part-aware prompt learning method to generate part-specific prompts that enable the CLIP model to better understand the concept of ``part'' and fully utilize its textual space. Furthermore, since the concept of the same part under different object categories is general, we establish relationships between these parts during the prompt learning process. We conduct extensive experiments on the PartImageNet and Pascal$\_$Part datasets, and the experimental results demonstrated that our proposed method achieves state-of-the-art performance.