PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer
This addresses scale variance in CNNs for computer vision tasks, offering a drop-in replacement for convolution layers with incremental improvements.
The paper tackles the problem of scale sensitivity in Convolutional Neural Networks by introducing Poly-Scale Convolution (PSConv), which mixes dilation rates within kernels to enhance multi-scale feature aggregation without extra parameters or computational cost, achieving superior performance on ImageNet and MS COCO benchmarks.
Despite their strong modeling capacities, Convolutional Neural Networks (CNNs) are often scale-sensitive. For enhancing the robustness of CNNs to scale variance, multi-scale feature fusion from different layers or filters attracts great attention among existing solutions, while the more granular kernel space is overlooked. We bridge this regret by exploiting multi-scale features in a finer granularity. The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates and tactfully allocate them in the individual convolutional kernels of each filter regarding a single convolutional layer. Specifically, dilation rates vary cyclically along the axes of input and output channels of the filters, aggregating features over a wide range of scales in a neat style. PSConv could be a drop-in replacement of the vanilla convolution in many prevailing CNN backbones, allowing better representation learning without introducing additional parameters and computational complexities. Comprehensive experiments on the ImageNet and MS COCO benchmarks validate the superior performance of PSConv. Code and models are available at https://github.com/d-li14/PSConv.