LSAS: Lightweight Sub-attention Strategy for Alleviating Attention Bias Problem
This addresses a specific issue in computer vision for researchers and practitioners, offering an incremental improvement to existing attention mechanisms.
The paper tackles the attention bias problem in deep neural networks for computer vision, where models focus on irrelevant or incomplete image regions, and proposes a lightweight sub-attention strategy (LSAS) that improves performance on benchmark datasets, achieving gains such as a 2.1% increase in accuracy on CIFAR-100.
In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i.e., the ability to recognize and focus on key pixel regions in an image. However, in this paper, we quantitatively and statistically illustrate that DNNs have a serious attention bias problem on many samples from some popular datasets: (1) Position bias: DNNs fully focus on label-independent regions; (2) Range bias: The focused regions from DNN are not completely contained in the ideal region. Moreover, we find that the existing self-attention modules can alleviate these biases to a certain extent, but the biases are still non-negligible. To further mitigate them, we propose a lightweight sub-attention strategy (LSAS), which utilizes high-order sub-attention modules to improve the original self-attention modules. The effectiveness of LSAS is demonstrated by extensive experiments on widely-used benchmark datasets and popular attention networks. We release our code to help other researchers to reproduce the results of LSAS~\footnote{https://github.com/Qrange-group/LSAS}.