LG CV MLNov 6, 2019

A Programmable Approach to Neural Network Compression

Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland, Ganesh Gopalakrishnan

arXiv:1911.02497v27.110 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient, automated model compression for practitioners deploying DNNs on resource-constrained hardware, though it is incremental as it builds on existing compression techniques.

The paper tackles the problem of manually finding optimal compression strategies for deep neural networks by introducing Condensa, a programmable system that automatically infers desirable sparsities using Bayesian optimization, achieving up to 188x memory footprint reduction and 2.59x runtime throughput improvement in experiments.

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both the model size and inference time without appreciable loss in accuracy. However, finding the best compression strategy and corresponding target sparsity for a given DNN, hardware platform, and optimization objective currently requires expensive, frequently manual, trial-and-error experimentation. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build more complex and practically interesting compression strategies. Given a strategy and user-provided objective (such as minimization of running time), Condensa uses a novel Bayesian optimization-based algorithm to automatically infer desirable sparsities. Our experiments on four real-world DNNs demonstrate memory footprint and hardware runtime throughput improvements of 188x and 2.59x, respectively, using at most ten samples per search. We have released a reference implementation of Condensa at https://github.com/NVlabs/condensa.

View on arXiv PDF Code

Similar