CVNov 12, 2019

Pose Guided Attention for Multi-label Fashion Image Classification

arXiv:1911.05024v123 citations
Originality Incremental advance
AI Analysis

This work addresses fashion image classification for e-commerce or style analysis, but it is incremental as it builds on existing attention and pose methods.

The authors tackled multi-label fashion image classification by proposing a visual semantic attention model supervised by automatic pose extraction, which outperformed state-of-the-art on an in-house dataset and performed competitively on DeepFashion without landmark annotations.

We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes