CVSep 25, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner

arXiv:2209.12255v29 citationsh-index: 82
Originality Incremental advance
AI Analysis

This addresses the problem of data scarcity in few-shot learning for AI applications, though it is incremental by building on existing pre-trained models.

The paper tackles few-shot classification by proposing CoMo, which combines knowledge from CLIP, DINO, and DALL-E to expand data and ensemble predictions, achieving state-of-the-art results on 11 datasets.

Few-shot classification requires deep neural networks to learn generalized representations only from limited training images, which is challenging but significant in low-data regimes. Recently, CLIP-based methods have shown promising few-shot performance benefited from the contrastive language-image pre-training. Based on this point, we question if the large-scale pre-training can alleviate the few-shot data deficiency and also assist the representation learning by the pre-learned knowledge. In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning. Our CoMo includes: CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, and DALL-E's language-generative knowledge. Specifically, CoMo works in two aspects: few-shot data expansion and diverse knowledge ensemble. For one, we generate synthetic images via zero-shot DALL-E to enrich the few-shot training data without any manpower. For the other, we introduce a learnable Multi-Knowledge Adapter (MK-Adapter) to adaptively blend the predictions from CLIP and DINO. By such collaboration, CoMo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification. We conduct extensive experiments on 11 datasets to demonstrate the superiority and generalization ability of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes