Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures
This work addresses the need for efficient performance estimation in deep learning training on parallel architectures, but it is incremental as it builds on existing modeling approaches.
The authors tackled the problem of predicting execution time for training convolutional neural networks on Intel Many Integrated Core architectures by developing two parameterized performance models, achieving average prediction accuracies of about 15% and 11%.
Many complex problems, such as natural language processing or visual object detection, are solved using deep learning. However, efficient training of complex deep convolutional neural networks for large data sets is computationally demanding and requires parallel computing resources. In this paper, we present two parameterized performance models for estimation of execution time of training convolutional neural networks on the Intel many integrated core architecture. While for the first performance model we minimally use measurement techniques for parameter value estimation, in the second model we estimate more parameters based on measurements. We evaluate the prediction accuracy of performance models in the context of training three different convolutional neural network architectures on the Intel Xeon Phi. The achieved average performance prediction accuracy is about 15% for the first model and 11% for second model.