CVAIMMApr 3, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

arXiv:2304.01195v1124 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing CLIP for few-shot learning in vision tasks, offering a more efficient and accurate solution, though it is incremental as it builds upon existing CLIP-based methods.

The paper tackles the problem of improving CLIP's performance on downstream vision tasks with few-shot learning by proposing APE, an adaptive prior refinement method that achieves state-of-the-art accuracy with high computational efficiency, outperforming the second-best by +1.59% and +1.99% on average over 11 benchmarks under 16 shots with 30 times fewer learnable parameters.

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks. To improve its capacity on downstream tasks, few-shot learning has become a widely-adopted technique. However, existing methods either exhibit limited performance or suffer from excessive learnable parameters. In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes