CVMar 7, 2019

Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up

arXiv:1903.02827v1292 citations
Originality Incremental advance
AI Analysis

This work addresses fine-grained image classification for computer vision applications, offering a novel approach to capture complementary object parts, though it is incremental in building upon existing weakly supervised detection and segmentation methods.

The paper tackles the problem of deep convolutional neural networks focusing only on the most discriminative parts in fine-grained image classification, missing complementary information, by building complementary parts models in a weakly supervised manner, resulting in significant improvements over baseline models and state-of-the-art algorithms with gains of 6.7%, 2.8%, and 5.2% on three datasets.

Given a training dataset composed of images and corresponding category labels, deep convolutional neural networks show a strong ability in mining discriminative parts for image classification. However, deep convolutional neural networks trained with image level labels only tend to focus on the most discriminative parts while missing other object parts, which could provide complementary information. In this paper, we approach this problem from a different perspective. We build complementary parts models in a weakly supervised manner to retrieve information suppressed by dominant object parts detected by convolutional neural networks. Given image level labels only, we first extract rough object instances by performing weakly supervised object detection and instance segmentation using Mask R-CNN and CRF-based segmentation. Then we estimate and search for the best parts model for each object instance under the principle of preserving as much diversity as possible. In the last stage, we build a bi-directional long short-term memory (LSTM) network to fuze and encode the partial information of these complementary parts into a comprehensive feature for image classification. Experimental results indicate that the proposed method not only achieves significant improvement over our baseline models, but also outperforms state-of-the-art algorithms by a large margin (6.7%, 2.8%, 5.2% respectively) on Stanford Dogs 120, Caltech-UCSD Birds 2011-200 and Caltech 256.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes