CVJun 4, 2023

Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification

CMU
arXiv:2306.02243v312 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses domain gaps in fine-grained classification for vision tasks, offering a novel method that enhances few-shot learning with CLIP, though it is incremental as it builds on existing prompt learning baselines.

The paper tackles the problem of fine-grained classification in few-shot learning with CLIP models, which struggle due to domain gaps, by proposing retrieval-enhanced visual prompt learning (RePrompt) that caches and reuses knowledge from downstream tasks to improve accuracy. It achieves state-of-the-art performance on 19 diverse vision datasets, including image, video, multi-view, and domain generalization benchmarks.

The Contrastive Language-Image Pretraining (CLIP) model has been widely used in various downstream vision tasks. The few-shot learning paradigm has been widely adopted to augment its capacity for these tasks. However, current paradigms may struggle with fine-grained classification, such as satellite image recognition, due to widening domain gaps. To address this limitation, we propose retrieval-enhanced visual prompt learning (RePrompt), which introduces retrieval mechanisms to cache and reuse the knowledge of downstream tasks. RePrompt constructs a retrieval database from either training examples or external data if available, and uses a retrieval mechanism to enhance multiple stages of a simple prompt learning baseline, thus narrowing the domain gap. During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions. A detailed analysis reveals that retrieval helps to improve the distribution of late features, thus, improving generalization for downstream tasks. Reprompt attains state-of-the-art performance on a wide range of vision datasets, including 11 image datasets, 3 video datasets, 1 multi-view dataset, and 4 domain generalization benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes