CVMar 20, 2022

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

arXiv:2203.10444v259 citationsh-index: 137
Originality Highly original
AI Analysis

This addresses the need for scalable and visually-relevant semantic embeddings in zero-shot learning, reducing reliance on costly human annotations.

The paper tackled the problem of generating semantic embeddings for zero-shot learning without human annotation by discovering embeddings that reflect visual similarities, resulting in significant performance improvements over word embeddings on three benchmarks.

Human-annotated attributes serve as powerful semantic embeddings in zero-shot learning. However, their annotation process is labor-intensive and needs expert supervision. Current unsupervised semantic embeddings, i.e., word embeddings, enable knowledge transfer between classes. However, word embeddings do not always reflect visual similarities and result in inferior zero-shot performance. We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness. To associate these clusters with previously unseen classes, we use external knowledge, e.g., word embeddings and propose a novel class relation discovery module. Through quantitative and qualitative evaluation, we demonstrate that our model discovers semantic embeddings that model the visual properties of both seen and unseen classes. Furthermore, we demonstrate on three benchmarks that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes