CVDec 26, 2015

Part-Stacked CNN for Fine-Grained Visual Categorization

Shaoli Huang, Zhe Xu, Dacheng Tao, Ya Zhang

arXiv:1512.08086v128.2460 citations

Originality Highly original

AI Analysis

This work addresses the problem of interpretability and accuracy in fine-grained visual categorization for applications requiring human-understandable models, representing an incremental improvement with a novel hybrid method.

The paper tackles fine-grained visual categorization by proposing a Part-Stacked CNN that models subtle differences from object parts to improve classification accuracy and interpretability, achieving efficient inference at 20 frames/sec on the CUB-200-2011 dataset.

In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the fine-grained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that en- codes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from both the perspective of classification accuracy and model interpretability.

View on arXiv PDF

Similar