A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks
This work addresses the need for efficient deep learning models in real-time, mobile applications, but it appears incremental as it builds on existing model compression and acceleration techniques.
The paper tackles the problem of high computational and storage demands in deep neural networks, especially for resource-constrained applications like mobile devices, by proposing a clustering-based model compression and acceleration technique that achieves acceleration gains compared to conventional k-means approaches, as validated through extensive evaluation on state-of-the-art DNN models in image classification.
Deep learning and especially the use of Deep Neural Networks (DNNs) provides impressive results in various regression and classification tasks. However, to achieve these results, there is a high demand for computing and storing resources. This becomes problematic when, for instance, real-time, mobile applications are considered, in which the involved (embedded) devices have limited resources. A common way of addressing this problem is to transform the original large pre-trained networks into new smaller models, by utilizing Model Compression and Acceleration (MCA) techniques. Within the MCA framework, we propose a clustering-based approach that is able to increase the number of employed centroids/representatives, while at the same time, have an acceleration gain compared to conventional, $k$-means based approaches. This is achieved by imposing a special structure to the employed representatives, which is enabled by the particularities of the problem at hand. Moreover, the theoretical acceleration gains are presented and the key system hyper-parameters that affect that gain, are identified. Extensive evaluation studies carried out using various state-of-the-art DNN models trained in image classification, validate the superiority of the proposed method as compared for its use in MCA tasks.