CVJun 26, 2023

An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning

William Heyden, Habib Ullah, M. Salman Siddiqui, Fadi Al Machot

arXiv:2306.14628v25.05 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This addresses domain shift issues in zero-shot learning for computer vision applications, offering an incremental improvement with enhanced interpretability.

The authors tackled the domain shift problem in zero-shot learning by proposing an integral projection-based semantic autoencoder (IP-SAE) that projects visual-semantic data into a latent space to preserve discriminatory information, resulting in state-of-the-art performance on four benchmark datasets.

Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes). Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature vector space into the semantic space and the decoder reconstructs the original visual feature space. The objective is to learn the embedding by leveraging a source data distribution, which can be applied effectively to a different but related target data distribution. Such embedding-based methods are prone to domain shift problems and are vulnerable to biases. We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space concatenated with the semantic space into a latent representation space. We force the decoder to reconstruct the visual-semantic data space. Due to this constraint, the visual-semantic projection function preserves the discriminatory data included inside the original visual feature space. The enriched projection forces a more precise reconstitution of the visual feature space invariant to the domain manifold. Consequently, the learned projection function is less domain-specific and alleviates the domain shift problem. Our proposed IP-SAE model consolidates a symmetric transformation function for embedding and projection, and thus, it provides transparency for interpreting generative applications in ZSL. Therefore, in addition to outperforming state-of-the-art methods considering four benchmark datasets, our analytical approach allows us to investigate distinct characteristics of generative-based methods in the unique context of zero-shot inference.

View on arXiv PDF Code

Similar