CLAREL: Classification via retrieval loss for zero-shot learning
This work addresses zero-shot learning for fine-grained classification, offering incremental improvements in performance on specific datasets.
The paper tackles the problem of learning fine-grained cross-modal representations for zero-shot learning by proposing an instance-based deep metric learning approach, showing that per-image semantic supervision improves zero-shot performance over class-only supervision and providing a probabilistic justification for metric rescaling to address classification of unseen classes as seen ones, with CLAREL outperforming existing approaches on CUB and FLOWERS datasets.
We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic justification for a metric rescaling approach that solves a very common problem in the generalized zero-shot learning setting, i.e., classifying test images from unseen classes as one of the classes seen during training. We evaluate our approach on two fine-grained zero-shot learning datasets: CUB and FLOWERS. We find that on the generalized zero-shot classification task CLAREL consistently outperforms the existing approaches on both datasets.