RepMet: Representative-based metric learning for classification and one-shot object detection
This work addresses few-shot object detection, a key challenge in computer vision for scenarios with limited training data, though it is incremental as it builds on existing DML and detection frameworks.
The authors tackled the problem of distance metric learning for object classification and few-shot object detection by proposing RepMet, which learns backbone parameters, embedding space, and multi-modal distributions end-to-end. The method outperformed state-of-the-art on fine-grained datasets and achieved best results on ImageNet-LOC for few-shot detection.
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.