Bi-Adversarial Auto-Encoder for Zero-Shot Learning
This addresses the challenge of improving visual-semantic interactions in ZSL for computer vision applications, representing an incremental advancement over existing methods.
The paper tackled the problem of unidirectional alignment in generative Zero-Shot Learning (ZSL) by proposing a bi-adversarial auto-encoder for bi-directional visual-semantic alignment, resulting in competitive performance on traditional and generalized ZSL tasks across four benchmark datasets.
Existing generative Zero-Shot Learning (ZSL) methods only consider the unidirectional alignment from the class semantics to the visual features while ignoring the alignment from the visual features to the class semantics, which fails to construct the visual-semantic interactions well. In this paper, we propose to synthesize visual features based on an auto-encoder framework paired with bi-adversarial networks respectively for visual and semantic modalities to reinforce the visual-semantic interactions with a bi-directional alignment, which ensures the synthesized visual features to fit the real visual distribution and to be highly related to the semantics. The encoder aims at synthesizing real-like visual features while the decoder forces both the real and the synthesized visual features to be more related to the class semantics. To further capture the discriminative information of the synthesized visual features, both the real and synthesized visual features are forced to be classified into the correct classes via a classification network. Experimental results on four benchmark datasets show that the proposed approach is particularly competitive on both the traditional ZSL and the generalized ZSL tasks.