CVNov 9, 2019

Learning Deep Bilinear Transformation for Fine-grained Image Representation

Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

arXiv:1911.03621v119.7174 citationsHas Code

Originality Highly original

AI Analysis

This addresses the computational bottleneck for researchers and practitioners using bilinear methods in fine-grained image recognition, offering an incremental improvement over existing approaches.

The paper tackled the high computational cost of bilinear feature transformations in deep neural networks for fine-grained image recognition by proposing a deep bilinear transformation (DBT) block that divides channels into groups to reduce pairwise interactions, achieving new state-of-the-art results on benchmarks like CUB-Bird, Stanford-Car, and FGVC-Aircraft.

Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations. However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks. In this paper, we propose a deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations. The DBT block can uniformly divide input channels into several semantic groups. As bilinear transformation can be represented by calculating pairwise interactions within each group, the computational cost can be heavily relieved. The output of each block is further obtained by aggregating intra-group bilinear features, with residuals from the entire input features. We found that the proposed network achieves new state-of-the-art in several fine-grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.

View on arXiv PDF Code

Similar