CVOct 26, 2023

Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification

arXiv:2310.17218v318 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently adapting CLIP for object re-identification tasks, which is important for applications like surveillance and robotics, but it is incremental as it builds on existing CLIP-based methods.

The authors tackled the problem of adapting large-scale pre-trained vision-language models like CLIP for object re-identification (Re-ID) by proposing a method that fine-tunes the image encoder using prototypical contrastive learning, eliminating the need for prompt learning; they achieved competitive performance in supervised settings and state-of-the-art results in unsupervised scenarios on person and vehicle datasets.

This work aims to adapt large-scale pre-trained vision-language models, such as contrastive language-image pretraining (CLIP), to enhance the performance of object reidentification (Re-ID) across various supervision settings. Although prompt learning has enabled a recent work named CLIP-ReID to achieve promising performance, the underlying mechanisms and the necessity of prompt learning remain unclear due to the absence of semantic labels in ReID tasks. In this work, we first analyze the role prompt learning in CLIP-ReID and identify its limitations. Based on our investigations, we propose a simple yet effective approach to adapt CLIP for supervised object Re-ID. Our approach directly fine-tunes the image encoder of CLIP using a prototypical contrastive learning (PCL) loss, eliminating the need for prompt learning. Experimental results on both person and vehicle Re-ID datasets demonstrate the competitiveness of our method compared to CLIP-ReID. Furthermore, we extend our PCL-based CLIP fine-tuning approach to unsupervised scenarios, where we achieve state-of-the art performance. Code is available at https://github.com/RikoLi/PCL-CLIP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes