CVMar 7, 2022

Towards Unbiased Multi-label Zero-Shot Learning with Pyramid and Semantic Attention

arXiv:2203.03483v126 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses bias in multi-label zero-shot learning for computer vision applications, though it appears incremental as it builds on existing attention mechanisms.

The paper tackles the problem of bias towards major classes in multi-label zero-shot learning by proposing a framework with Pyramid Feature Attention and Semantic Attention to balance global and local information, achieving significant performance improvements on benchmarks like NUS-WIDE and Open-Image.

Multi-label zero-shot learning extends conventional single-label zero-shot learning to a more realistic scenario that aims at recognizing multiple unseen labels of classes for each input sample. Existing works usually exploit attention mechanism to generate the correlation among different labels. However, most of them are usually biased on several major classes while neglect most of the minor classes with the same importance in input samples, and may thus result in overly diffused attention maps that cannot sufficiently cover minor classes. We argue that disregarding the connection between major and minor classes, i.e., correspond to the global and local information, respectively, is the cause of the problem. In this paper, we propose a novel framework of unbiased multi-label zero-shot learning, by considering various class-specific regions to calibrate the training process of the classifier. Specifically, Pyramid Feature Attention (PFA) is proposed to build the correlation between global and local information of samples to balance the presence of each class. Meanwhile, for the generated semantic representations of input samples, we propose Semantic Attention (SA) to strengthen the element-wise correlation among these vectors, which can encourage the coordinated representation of them. Extensive experiments on the large-scale multi-label zero-shot benchmarks NUS-WIDE and Open-Image demonstrate that the proposed method surpasses other representative methods by significant margins.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes