Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU
This work addresses the need for faster deployment of computer vision applications on limited-capacity embedded devices, though it is incremental as it focuses on optimizing an existing method.
The study tackled the problem of efficient binary convolutional neural network inference on GPUs by optimizing XNOR convolution, achieving a speed-up of up to 42.61× with a 3×3 kernel size.
Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts. So, they are a feasible alternative for the deployment of computer vision applications on limited capacity embedded devices. Once trained on less resource-constrained computational environments, they can be deployed for real-time inference on such devices. In this study, we propose an implementation of binary convolutional network inference on GPU by focusing on optimization of XNOR convolution. Experimental results show that using GPU can provide a speed-up of up to $42.61\times$ with a kernel size of $3\times3$. The implementation is publicly available at https://github.com/metcan/Binary-Convolutional-Neural-Network-Inference-on-GPU