Compact Bilinear Pooling
This addresses the impracticality of high-dimensional bilinear features for visual recognition, offering a more efficient solution for researchers and practitioners in computer vision.
The paper tackles the high dimensionality of bilinear features in visual tasks by proposing two compact bilinear representations that reduce dimensions to a few thousand while maintaining discriminative power, enabling end-to-end optimization and showing utility in image classification and few-shot learning across datasets.
Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Experimentation illustrate the utility of the proposed representations for image classification and few-shot learning across several datasets.