Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
This work addresses the problem of recognizing subtle visual differences in objects for computer vision applications, representing an incremental improvement over existing bilinear pooling methods.
The paper tackles fine-grained visual recognition by proposing a hierarchical bilinear pooling framework that captures inter-layer part feature interactions, achieving state-of-the-art results on widely used datasets.
Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning. Bilinear pooling based models have been shown to be effective at fine-grained recognition, while most previous approaches neglect the fact that inter-layer part feature interaction and fine-grained feature learning are mutually correlated and can reinforce each other. In this paper, we present a novel model to address these issues. First, a cross-layer bilinear pooling approach is proposed to capture the inter-layer part feature relations, which results in superior performance compared with other bilinear pooling based approaches. Second, we propose a novel hierarchical bilinear pooling framework to integrate multiple cross-layer bilinear features to enhance their representation capability. Our formulation is intuitive, efficient and achieves state-of-the-art results on the widely used fine-grained recognition datasets.