CVJun 10, 2020

Simple and effective localized attribute representations for zero-shot learning

arXiv:2006.05938v311 citations
AI Analysis

This work addresses the problem of zero-shot learning for image classification by offering a simpler and more interpretable approach compared to complex visual localization methods.

The paper tackles zero-shot learning by proposing a method that localizes representations in the semantic/attribute space instead of the visual space, achieving state-of-the-art performance on CUB and SUN datasets and competitive results on AWA2.

Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their semantic descriptions. Some recent papers have shown the importance of localized features together with fine-tuning the feature extractor to obtain discriminative and transferable features. However, these methods require complex attention or part detection modules to perform explicit localization in the visual space. In contrast, in this paper we propose localizing representations in the semantic/attribute space, with a simple but effective pipeline where localization is implicit. Focusing on attribute representations, we show that our method obtains state-of-the-art performance on CUB and SUN datasets, and also achieves competitive results on AWA2 dataset, outperforming generally more complex methods with explicit localization in the visual space. Our method can be implemented easily, which can be used as a new baseline for zero shot-learning. In addition, our localized representations are highly interpretable as attribute-specific heatmaps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes