CVNov 18, 2020

Layer-Wise Data-Free CNN Compression

Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari

arXiv:2011.09058v39.119 citations

Originality Highly original

AI Analysis

This work provides a more efficient and accurate method for data-free neural network compression, which is beneficial for deploying models in data-sensitive or resource-constrained environments.

This paper introduces a computationally efficient method for compressing trained neural networks without real data by breaking it into independent layer-wise compressions. The method generates layer-wise training data from the pretrained network itself. For 8-bit quantization of MobileNetV2 on ImageNet, it achieves a +0.34% improvement, and up to +28.50% at 5 bits, while using orders of magnitude less compute. For pruning, it achieves 1.5 times the sparsity rate at the same accuracy compared to baselines.

We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data using only a pretrained network. We use this data to perform independent layer-wise compressions on the pretrained network. We also show how to precondition the network to improve the accuracy of our layer-wise compression method. We present results for layer-wise compression using quantization and pruning. When quantizing, we compress with higher accuracy than related works while using orders of magnitude less compute. When compressing MobileNetV2 and evaluating on ImageNet, our method outperforms existing methods for quantization at all bit-widths, achieving a $+0.34\%$ improvement in $8$-bit quantization, and a stronger improvement at lower bit-widths (up to a $+28.50\%$ improvement at $5$ bits). When pruning, we outperform baselines of a similar compute envelope, achieving $1.5$ times the sparsity rate at the same accuracy. We also show how to combine our efficient method with high-compute generative methods to improve upon their results.

View on arXiv PDF

Similar