CVMay 9, 2020

Generative Model-driven Structure Aligning Discriminative Embeddings for Transductive Zero-shot Learning

arXiv:2005.04492v14 citations
AI Analysis

This work addresses the domain shift issue in zero-shot learning for computer vision applications, representing an incremental improvement over existing methods.

The authors tackled the projection domain shift problem in zero-shot learning by proposing a transductive approach that uses unlabeled unseen class data to generate semantic features via a conditional variational auto-encoder, improving projection function learning. They demonstrated superior performance on six standard benchmark datasets in both inductive and transductive settings, including in low-data regimes.

Zero-shot Learning (ZSL) is a transfer learning technique which aims at transferring knowledge from seen classes to unseen classes. This knowledge transfer is possible because of underlying semantic space which is common to seen and unseen classes. Most existing approaches learn a projection function using labelled seen class data which maps visual data to semantic data. In this work, we propose a shallow but effective neural network-based model for learning such a projection function which aligns the visual and semantic data in the latent space while simultaneously making the latent space embeddings discriminative. As the above projection function is learned using the seen class data, the so-called projection domain shift exists. We propose a transductive approach to reduce the effect of domain shift, where we utilize unlabeled visual data from unseen classes to generate corresponding semantic features for unseen class visual samples. While these semantic features are initially generated using a conditional variational auto-encoder, they are used along with the seen class data to improve the projection function. We experiment on both inductive and transductive setting of ZSL and generalized ZSL and show superior performance on standard benchmark datasets AWA1, AWA2, CUB, SUN, FLO, and APY. We also show the efficacy of our model in the case of extremely less labelled data regime on different datasets in the context of ZSL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes