Towards a learning-based performance modeling for accelerating Deep Neural Networks
This work addresses performance optimization for deep learning applications on ARM GPUs, but it is incremental as it builds on existing machine learning techniques for a specific domain.
The paper tackled the problem of optimizing Convolutional Neural Networks (CNNs) by developing a learning-based performance model to select the best convolution operator implementation, showing that it outperforms manual selections in the ARM Compute Library on a Mali GPU.
Emerging applications such as Deep Learning are often data-driven, thus traditional approaches based on auto-tuners are not performance effective across the wide range of inputs used in practice. In the present paper, we start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs). As a use-case, we focus on the ARM Compute Library which provides three different implementations of the convolution operator at different numeric precision. Starting from a collation of benchmarks, we build and validate models learned by Decision Tree and naive Bayesian classifier. Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.