CVJul 8, 2024

Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification

arXiv:2407.05647v12.0h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of few-shot classification for unseen images, offering an incremental improvement by integrating local and high-level features.

The paper tackles few-shot image classification by combining low-level local representations with high-level semantic features from CLIP, achieving superior performance over state-of-the-art methods on challenging tasks.

The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled samples. Especially, in contrast to high-level representations, local representations (LRs) at low-level are more consistent between seen and unseen samples. Based on this point, we propose the Meta-Feature Adaption method (MF-Adapter) that combines the complementary strengths of both LRs and high-level semantic representations. Specifically, we introduce the Meta-Feature Unit (MF-Unit), which is a simple yet effective local similarity metric to measure category-consistent local context in an inductive manner. Then we train an MF-Adapter to map image features to MF-Unit for adequately generalizing the intra-class knowledge between unseen images and the support set. Extensive experiments show that our proposed method is superior to the state-of-the-art CLIP downstream few-shot classification methods, even showing stronger performance on a set of challenging visual classification tasks.

View on arXiv PDF

Similar