CVNov 3, 2021

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

arXiv:2111.02018v22.61 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses robustness and efficiency issues in image classification, particularly for applications requiring reliable performance under adversarial conditions, though it is incremental as it builds on existing attention and recurrent mechanisms.

The paper tackles the challenges of high computation and lack of robustness in convolutional neural networks by proposing the Multi-Glimpse Network (MGNet), which uses a recurrent downsampled attention mechanism to focus on task-relevant regions, resulting in improved accuracy on common corruptions and adversarial attacks with reduced computational cost.

Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism. Specifically, MGNet sequentially selects task-relevant regions of an image to focus on and then adaptively combines all collected information for the final prediction. MGNet expresses strong resistance against adversarial attacks and common corruptions with less computation. Also, MGNet is inherently more interpretable as it explicitly informs us where it focuses during each iteration. Our experiments on ImageNet100 demonstrate the potential of recurrent downsampled attention mechanisms to improve a single feedforward manner. For example, MGNet improves 4.76% accuracy on average in common corruptions with only 36.9% computational cost. Moreover, while the baseline incurs an accuracy drop to 7.6%, MGNet manages to maintain 44.2% accuracy in the same PGD attack strength with ResNet-50 backbone. Our code is available at https://github.com/siahuat0727/MGNet.

View on arXiv PDF Code

Similar