CORL: Compositional Representation Learning for Few-Shot Classification
This addresses the problem of recognizing unseen classes from very few examples for computer vision researchers, but it is incremental as it builds on existing compositional and meta-learning approaches.
The paper tackles few-shot image classification by proposing a compositional representation learning framework that explicitly models objects as shared components and their spatial composition, achieving comparable performance on benchmarks like miniImageNet, tieredImageNet, CIFAR-FS, and FC100.
Few-shot image classification consists of two consecutive learning processes: 1) In the meta-learning stage, the model acquires a knowledge base from a set of training classes. 2) During meta-testing, the acquired knowledge is used to recognize unseen classes from very few examples. Inspired by the compositional representation of objects in humans, we train a neural network architecture that explicitly represents objects as a dictionary of shared components and their spatial composition. In particular, during meta-learning, we train a knowledge base that consists of a dictionary of component representations and a dictionary of component activation maps that encode common spatial activation patterns of components. The elements of both dictionaries are shared among the training classes. During meta-testing, the representation of unseen classes is learned using the component representations and the component activation maps from the knowledge base. Finally, an attention mechanism is used to strengthen those components that are most important for each category. We demonstrate the value of our interpretable compositional learning framework for a few-shot classification using miniImageNet, tieredImageNet, CIFAR-FS, and FC100, where we achieve comparable performance.