CVDec 26, 2015

Part-Stacked CNN for Fine-Grained Visual Categorization

arXiv:1512.08086v1460 citations
Originality Highly original
AI Analysis

This work addresses the problem of interpretability and accuracy in fine-grained visual categorization for applications requiring human-understandable models, representing an incremental improvement with a novel hybrid method.

The paper tackles fine-grained visual categorization by proposing a Part-Stacked CNN that models subtle differences from object parts to improve classification accuracy and interpretability, achieving efficient inference at 20 frames/sec on the CUB-200-2011 dataset.

In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the fine-grained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that en- codes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from both the perspective of classification accuracy and model interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes