CVMar 21, 2024

Enabling Generalized Zero-shot Learning Towards Unseen Domains by Intrinsic Learning from Redundant LLM Semantics

Jiaqi Yue, Chunhui Zhao, Jiancheng Zhao, Biao Huang

arXiv:2403.14362v53.72 citationsh-index: 10Has CodeNeural Networks

Originality Incremental advance

AI Analysis

This work addresses a domain shift problem in zero-shot learning for computer vision, but it appears incremental as it extends existing GZSL methods to unseen domains.

The paper tackles the problem of cross-domain generalized zero-shot learning (CDGZSL) by proposing Meta Domain Alignment Semantic Refinement (MDASR) to align redundant LLM-based semantics with a common feature space, achieving improved performance on datasets like Office-Home and Mini-DomainNet.

Generalized zero-shot learning (GZSL) focuses on recognizing seen and unseen classes against domain shift problem where data of unseen classes may be misclassified as seen classes. However, existing GZSL is still limited to seen domains. In the current work, we study cross-domain GZSL (CDGZSL) which addresses GZSL towards unseen domains. Different from existing GZSL methods, CDGZSL constructs a common feature space across domains and acquires the corresponding intrinsic semantics shared among domains to transfer from seen to unseen domains. Considering the information asymmetry problem caused by redundant class semantics annotated with large language models (LLMs), we present Meta Domain Alignment Semantic Refinement (MDASR). Technically, MDASR consists of two parts: Inter-class similarity alignment, which eliminates the non-intrinsic semantics not shared across all domains under the guidance of inter-class feature relationships, and unseen-class meta generation, which preserves intrinsic semantics to maintain connectivity between seen and unseen classes by simulating feature generation. MDASR effectively aligns the redundant semantic space with the common feature space, mitigating the information asymmetry in CDGZSL. The effectiveness of MDASR is demonstrated on two datasets, Office-Home and Mini-DomainNet, and we have shared the LLM-based semantics for these datasets as a benchmark.

View on arXiv PDF Code

Similar