CVJul 10, 2018

Deep Imbalanced Attribute Classification using Visual Attention Aggregation

arXiv:1807.03903v2235 citations
Originality Incremental advance
AI Analysis

This addresses the problem of recognizing human visual attributes for applications like image description and human identification, but it is incremental as it builds on existing attention mechanisms with specific improvements.

The paper tackles the problem of visual attribute classification in computer vision, which is challenging due to multi-label nature, class imbalance, and lack of spatial annotations, by proposing a method that aggregates visual attention masks and introduces a loss function to handle imbalance, achieving state-of-the-art results on PETA and WIDER-Attribute datasets.

For many computer vision applications, such as image description and human identification, recognizing the visual attributes of humans is an essential yet challenging problem. Its challenges originate from its multi-label nature, the large underlying class imbalance and the lack of spatial annotations. Existing methods follow either a computer vision approach while failing to account for class imbalance, or explore machine learning solutions, which disregard the spatial and semantic relations that exist in the images. With that in mind, we propose an effective method that extracts and aggregates visual attention masks at different scales. We introduce a loss function to handle class imbalance both at class and at an instance level and further demonstrate that penalizing attention masks with high prediction variance accounts for the weak supervision of the attention mechanism. By identifying and addressing these challenges, we achieve state-of-the-art results with a simple attention mechanism in both PETA and WIDER-Attribute datasets without additional context or side information.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes