Weakly Supervised One-Shot Detection with Attention Similarity Networks
This work addresses the challenge of object detection with minimal supervision for unseen classes, which is incremental as it builds on existing Siamese and attention-based approaches.
The paper tackles the problem of weakly supervised one-shot detection, where a model must identify and localize instances of unseen classes using only a single exemplar, and shows that the proposed method significantly outperforms baseline methods in computer vision and audio datasets.
Neural network models that are not conditioned on class identities were shown to facilitate knowledge transfer between classes and to be well-suited for one-shot learning tasks. Following this motivation, we further explore and establish such models and present a novel neural network architecture for the task of weakly supervised one-shot detection. Our model is only conditioned on a single exemplar of an unseen class and a larger target example that may or may not contain an instance of the same class as the exemplar. By pairing a Siamese similarity network with an attention mechanism, we design a model that manages to simultaneously identify and localise instances of classes unseen at training time. In experiments with datasets from the computer vision and audio domains, the proposed method considerably outperforms the baseline methods for the weakly supervised one-shot detection task.