Mixture-based Feature Space Learning for Few-shot Image Classification
This work provides significant improvements in few-shot image classification accuracy for researchers and practitioners working with limited data scenarios.
This paper introduces Mixture-based Feature Space Learning (MixtFSL) to create a robust feature representation for few-shot image classification. By simultaneously training the feature extractor and learning mixture model parameters online, MixtFSL achieves new state-of-the-art results in the inductive setting, with 5-shot classification accuracies of 82.45% on miniImageNet, 88.20% on tieredImageNet, and 60.70% on FC100 using a ResNet-12 backbone.
We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously training the feature extractor and learning the mixture model parameters in an online manner. This results in a richer and more discriminative feature space which can be employed to classify novel examples from very few samples. Two main stages are proposed to train the MixtFSL model. First, the multimodal mixtures for each base class and the feature extractor parameters are learned using a combination of two loss functions. Second, the resulting network and mixture models are progressively refined through a leader-follower learning procedure, which uses the current estimate as a "target" network. This target network is used to make a consistent assignment of instances to mixture components, which increases performance and stabilizes training. The effectiveness of our end-to-end feature space learning approach is demonstrated with extensive experiments on four standard datasets and four backbones. Notably, we demonstrate that when we combine our robust representation with recent alignment-based approaches, we achieve new state-of-the-art results in the inductive setting, with an absolute accuracy for 5-shot classification of 82.45 on miniImageNet, 88.20 with tieredImageNet, and 60.70 in FC100 using the ResNet-12 backbone.