An optimal scheduling architecture for accelerating batch algorithms on Neural Network processor architectures
This work addresses a domain-specific problem for neural network practitioners by providing an incremental improvement in scheduling efficiency for batch algorithms on HPC architectures.
The paper tackles the problem of inefficient batch scheduling in neural network algorithms by proposing an optimal scheduling architecture that can be implemented in hardware or software, resulting in significant reductions in training and inference time compared to previous solutions.
In neural network topologies, algorithms are running on batches of data tensors. The batches of data are typically scheduled onto the computing cores which execute in parallel. For the algorithms running on batches of data, an optimal batch scheduling architecture is very much needed by suitably utilizing hardware resources - thereby resulting in significant reduction training and inference time. In this paper, we propose to accelerate the batch algorithms for neural networks through a scheduling architecture enabling optimal compute power utilization. The proposed optimal scheduling architecture can be built into HW or can be implemented in SW alone which can be leveraged for accelerating batch algorithms. The results demonstrate that the proposed architecture speeds up the batch algorithms compared to the previous solutions. The proposed idea applies to any HPC architecture meant for neural networks.