Discriminative Learning of Latent Features for Zero-Shot Recognition
This work addresses zero-shot recognition for computer vision, offering an incremental improvement by focusing on discriminative learning.
The paper tackled the problem of zero-shot learning by emphasizing the need for discriminative representations in both visual and semantic spaces, proposing a network that discovers discriminative regions and learns semantic representations, resulting in significant outperformance over state-of-the-art methods on two challenging datasets.
Zero-shot learning (ZSL) aims to recognize unseen image categories by learning an embedding space between image and semantic representations. For years, among existing works, it has been the center task to learn the proper mapping matrices aligning the visual and semantic space, whilst the importance to learn discriminative representations for ZSL is ignored. In this work, we retrospect existing methods and demonstrate the necessity to learn discriminative representations for both visual and semantic instances of ZSL. We propose an end-to-end network that is capable of 1) automatically discovering discriminative regions by a zoom network; and 2) learning discriminative semantic representations in an augmented space introduced for both user-defined and latent attributes. Our proposed method is tested extensively on two challenging ZSL datasets, and the experiment results show that the proposed method significantly outperforms state-of-the-art methods.