ILP-M Conv: Optimize Convolution Algorithm for Single-Image Convolution Neural Network Inference on Mobile GPUs
This work addresses the performance bottleneck for mobile applications that rely on real-time, single-image CNN inference, representing a domain-specific improvement.
The paper tackles the problem of inefficient single-image CNN inference on mobile GPUs by proposing the HNTMP convolution algorithm, which achieves a 14.6x speedup over the im2col method and a 2.30x speedup over the fastest existing direct convolution algorithm.
Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is not well-studied. After discussing the usage difference and examining the existing convolution algorithms, we proposed the HNTMP convolution algorithm. The HNTMP convolution algorithm achieves $14.6 \times$ speedup than the most popular \textit{im2col} convolution algorithm, and $2.30 \times$ speedup than the fastest existing convolution algorithm (direct convolution) as far as we know.