CVDec 12, 2022

Resolving Semantic Confusions for Improved Zero-Shot Detection

arXiv:2212.06097v112 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in zero-shot detection for computer vision applications, offering incremental improvements over existing generative methods.

The paper tackled the problem of semantic confusion in zero-shot detection, where models struggle to distinguish between semantically-similar unseen classes, and achieved significant gains by incorporating a triplet loss and cyclic-consistency loss into a generative model, improving detection on benchmark datasets like MSCOCO and PASCAL-VOC.

Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target ("unseen") classes. Recently, methods employing generative models like GANs have shown some of the best results, where unseen-class samples are generated based on their semantics by a GAN trained on seen-class data, enabling vanilla object detectors to recognize unseen objects. However, the problem of semantic confusion still remains, where the model is sometimes unable to distinguish between semantically-similar classes. In this work, we propose to train a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes and reflects them in the generated samples. Moreover, a cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics. Extensive experiments on two benchmark ZSD datasets - MSCOCO and PASCAL-VOC - demonstrate significant gains over the current ZSD methods, reducing semantic confusion and improving detection for the unseen classes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes