DCCVPFSep 6, 2019

ILP-M Conv: Optimize Convolution Algorithm for Single-Image Convolution Neural Network Inference on Mobile GPUs

arXiv:1909.02765v23 citations
AI Analysis

This work addresses the performance bottleneck for mobile applications that rely on real-time, single-image CNN inference, representing a domain-specific improvement.

The paper tackles the problem of inefficient single-image CNN inference on mobile GPUs by proposing the HNTMP convolution algorithm, which achieves a 14.6x speedup over the im2col method and a 2.30x speedup over the fastest existing direct convolution algorithm.

Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is not well-studied. After discussing the usage difference and examining the existing convolution algorithms, we proposed the HNTMP convolution algorithm. The HNTMP convolution algorithm achieves $14.6 \times$ speedup than the most popular \textit{im2col} convolution algorithm, and $2.30 \times$ speedup than the fastest existing convolution algorithm (direct convolution) as far as we know.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes