CVMay 23, 2016

Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition

arXiv:1605.06878v1130 citations
Originality Incremental advance
AI Analysis

This addresses the problem of distinguishing highly similar categories in computer vision, though it appears incremental as it builds on existing part-based approaches.

The paper tackles fine-grained image recognition by proposing Mask-CNN, an end-to-end model that localizes discriminative parts and selects descriptors using masks, achieving the highest recognition accuracy compared to state-of-the-art methods.

Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition accuracy when compared with state-of-the-arts fine-grained approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes