RFAConv: Receptive-Field Attention Convolution for Improving Convolutional Neural Networks
This work addresses a specific bottleneck in convolutional neural networks for computer vision tasks, offering an incremental improvement over current spatial attention methods.
The paper tackles the limitation of existing spatial attention mechanisms in addressing parameter sharing for large convolutional kernels by introducing Receptive-Field Attention (RFA) and RFAConv, which improve network performance with minimal computational overhead, as validated on datasets like ImageNet and COCO.
In the realm of deep learning, spatial attention mechanisms have emerged as a vital method for enhancing the performance of convolutional neural networks. However, these mechanisms possess inherent limitations that cannot be overlooked. This work delves into the mechanism of spatial attention and reveals a new insight. It is that the mechanism essentially addresses the issue of convolutional parameter sharing. By addressing this issue, the convolutional kernel can efficiently extract features by employing varying weights at distinct locations. However, current spatial attention mechanisms focus on shallow attention to spatial features, which is insufficient to address the fundamental challenge of parameter sharing in convolutions involving larger kernels. In response to this challenge, we introduce a novel attention mechanism known as Receptive-Field Attention (RFA). Compared to existing spatial attention methods, RFA not only concentrates on the receptive-field spatial features but also offers effective attention weights for large convolutional kernels. Building upon the RFA concept, a Receptive-Field Attention Convolution (RFAConv) is proposed to supplant the conventional standard convolution. Notably, it offers nearly negligible increment of computational overhead and parameters, while significantly improving network performance. Furthermore, this work reveals that current spatial attention mechanisms require enhanced prioritization of receptive-field spatial features to optimize network performance. To validate the advantages of the proposed methods, we conduct many experiments across several authoritative datasets, including ImageNet, COCO, VOC, and Roboflow...