Enabling Sparse Winograd Convolution by Native Pruning
This work addresses the problem of accelerating convolution computations for CNN practitioners by combining sparse and Winograd methods, though it is incremental as it builds on existing orthogonal approaches.
The paper tackles the challenge of training sparse Winograd convolutions by introducing a native pruning method, achieving over 90% sparsity with only 0.1% accuracy loss on AlexNet/ImageNet and a 5.4x speedup in CPU performance.
Sparse methods and the use of Winograd convolutions are two orthogonal approaches, each of which significantly accelerates convolution computations in modern CNNs. Sparse Winograd merges these two and thus has the potential to offer a combined performance benefit. Nevertheless, training convolution layers so that the resulting Winograd kernels are sparse has not hitherto been very successful. By introducing a Winograd layer in place of a standard convolution layer, we can learn and prune Winograd coefficients "natively" and obtain sparsity level beyond 90% with only 0.1% accuracy loss with AlexNet on ImageNet dataset. Furthermore, we present a sparse Winograd convolution algorithm and implementation that exploits the sparsity, achieving up to 31.7 effective TFLOP/s in 32-bit precision on a latest Intel Xeon CPU, which corresponds to a 5.4x speedup over a state-of-the-art dense convolution implementation.