CVDec 7, 2019

Dynamic Convolution: Attention over Convolution Kernels

arXiv:1912.03458v21328 citations
AI Analysis

This addresses the limited representation capability in resource-constrained CNNs, offering a novel method to enhance performance without increasing depth or width, which is incremental but impactful for mobile and edge AI applications.

The paper tackles the performance degradation of light-weight CNNs by introducing Dynamic Convolution, which aggregates multiple convolution kernels dynamically based on input-dependent attention, boosting ImageNet top-1 accuracy by 2.9% with only 4% additional FLOPs and achieving a 2.9 AP gain on COCO keypoint detection.

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes