CVJul 12, 2021

Structured Latent Embeddings for Recognizing Unseen Classes in Unseen Domains

arXiv:2107.05622v16 citations
Originality Incremental advance
AI Analysis

This addresses a real-world challenge for computer vision applications where annotated data is scarce and both domain and semantic shifts occur simultaneously, though it is incremental as it builds on existing zero-shot learning and domain generalization methods.

The paper tackles the problem of recognizing unseen classes in unseen domains, known as Zero-shot Domain Generalization, by learning domain-agnostic structured latent embeddings that align visual and text-based cues, achieving significant gains on benchmarks like DomainNet and DomainNet-LS, particularly in difficult domains such as quickdraw and sketch.

The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively. However, real-world applications often do not have constrained settings and necessitate handling unseen classes in unseen domains -- a setting called Zero-shot Domain Generalization, which presents the issues of domain and semantic shifts simultaneously. In this work, we propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains as well as class-specific semantic text-based representations to a common latent space. In particular, our method jointly strives for the following objectives: (i) aligning the multimodal cues from visual and text-based semantic concepts; (ii) partitioning the common latent space according to the domain-agnostic class-level semantic concepts; and (iii) learning a domain invariance w.r.t the visual-semantic joint distribution for generalizing to unseen classes in unseen domains. Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods, with significant gains on difficult domains like quickdraw and sketch.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes