CVAug 26, 2016

Scalable Compression of Deep Neural Networks

arXiv:1608.07365v12.14 citations

Originality Incremental advance

AI Analysis

This addresses storage and update challenges for mobile and embedded systems, but is incremental as it builds on existing quantization and retraining techniques.

The paper tackles the problem of deploying large deep neural networks on resource-constrained devices by proposing a scalable compression method that allows selection of bit rates based on storage limits, achieving graceful performance degradation in experiments.

Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.

View on arXiv PDF

Similar