CVMay 22, 2017

Semantic Softmax Loss for Zero-Shot Learning

arXiv:1705.07692v12 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of aligning different structural spaces in multimodal frameworks for zero-shot learning, offering an incremental improvement over existing methods.

The authors tackled the problem of Zero-Shot Learning by proposing a Semantic Softmax Loss to better capture semantic interactions between visual features and class descriptors, achieving state-of-the-art performance on benchmark datasets like AwA, CUB, and SUN.

A typical pipeline for Zero-Shot Learning (ZSL) is to integrate the visual features and the class semantic descriptors into a multimodal framework with a linear or bilinear model. However, the visual features and the class semantic descriptors locate in different structural spaces, a linear or bilinear model can not capture the semantic interactions between different modalities well. In this letter, we propose a nonlinear approach to impose ZSL as a multi-class classification problem via a Semantic Softmax Loss by embedding the class semantic descriptors into the softmax layer of multi-class classification network. To narrow the structural differences between the visual features and semantic descriptors, we further use an L2 normalization constraint to the differences between the visual features and visual prototypes reconstructed with the semantic descriptors. The results on three benchmark datasets, i.e., AwA, CUB and SUN demonstrate the proposed approach can boost the performances steadily and achieve the state-of-the-art performance for both zero-shot classification and zero-shot retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes