Large Scale Artificial Neural Network Training Using Multi-GPUs
This work addresses the computational bottleneck in training large neural networks for researchers and practitioners, though it appears incremental as it builds on existing matrix multiplication techniques.
The paper tackles the problem of accelerating large-scale artificial neural network training by reducing forward and backward passes to matrix multiplication, achieving linear speedup on multiple inhomogeneous GPUs.
This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix multiplication and integrate the algorithm with the ANN training. The experiments demonstrate that our matrix multiplication algorithm achieves linear speedup on multiple inhomogeneous GPUs. The full paper of this project can be found at [1].