CVMay 18, 2020

Cross-filter compression for CNN inference acceleration

arXiv:2005.09034v1
AI Analysis

This work addresses the problem of CNN inference acceleration for resource-constrained applications, offering a novel compression technique that is incremental over existing quantization methods.

The paper tackles the natural upper limit of filter-wise quantization for CNN acceleration by proposing a cross-filter compression method that quantizes all convolution filters and shares scaling factors among spatially adjacent filters, achieving ~32x memory savings and 122x speedup in convolution operations with tolerable accuracy loss on CIFAR-10 and ImageNet datasets.

Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of the kernel. Meanwhile, with the popularity of small kernel, the natural limit further decrease. To address this issue, we propose a new cross-filter compression method that can provide $\sim32\times$ memory savings and $122\times$ speed up in convolution operations. In our method, all convolution filters are quantized to given bits and spatially adjacent filters share the same scaling factor. Our compression method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset with widely used network structures, such as ResNet and VGG, and witness tolerable accuracy loss compared to state-of-the-art quantification methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes