CVAug 7, 2021

Information Bottleneck Approach to Spatial Attention Learning

Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu

arXiv:2108.03418v26.515 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for more efficient and interpretable attention mechanisms in deep learning for computer vision, though it is incremental as it builds on existing IB theory and attention methods.

The paper tackles the problem of incorporating information constraints into attention mechanisms for deep neural networks in visual recognition, proposing an Information Bottleneck-inspired spatial attention module that yields interpretable attention maps and improves performance on tasks like image classification and fine-grained recognition.

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at https://github.com/ashleylqx/AIB.git.

View on arXiv PDF Code

Similar