CVAINov 13, 2020

Transductive Zero-Shot Learning using Cross-Modal CycleGAN

arXiv:2011.06850v1
AI Analysis

This work provides a more efficient and effective solution for transductive zero-shot learning, which is beneficial for computer vision researchers working with classifying unseen classes.

This paper addresses the domain shift problem in transductive zero-shot learning (T-ZSL) where unseen classes are known during training but their image correspondences are not. The authors propose a new model, Cross-Modal CycleGAN (CM-GAN), which achieves state-of-the-art results on the ImageNet T-ZSL task.

In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes -- classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes that unseen classes and their associated images are known during training, but not their correspondence. As current T-ZSL approaches do not scale efficiently when the number of seen classes is high, we tackle this problem with a new model for T-ZSL based upon CycleGAN. Our model jointly (i) projects images on their seen class labels with a supervised objective and (ii) aligns unseen class labels and visual exemplars with adversarial and cycle-consistency objectives. We show the efficiency of our Cross-Modal CycleGAN model (CM-GAN) on the ImageNet T-ZSL task where we obtain state-of-the-art results. We further validate CM-GAN on a language grounding task, and on a new task that we propose: zero-shot sentence-to-image matching on MS COCO.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes