CVLGNEMar 19, 2020

LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units

arXiv:2003.08646v319 citations
AI Analysis

This work addresses the need for faster and more efficient neural network inference on GPUs, representing an incremental improvement by combining existing quantization and Winograd techniques.

The paper tackled the problem of accelerating deep convolutional neural networks by proposing LANCE, an efficient low-precision quantized Winograd convolution algorithm, which improved performance by up to 2.40x over full-precision convolution with trivial accuracy loss on image classification datasets.

Accelerating deep convolutional neural networks has become an active topic and sparked an interest in academia and industry. In this paper, we propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. By embedding linear quantization operations into the Winograd-domain, the fast convolution can be performed efficiently under low-precision computation on graphics processing units. We test neural network models with LANCE on representative image classification datasets, including SVHN, CIFAR, and ImageNet. The experimental results show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes