LGNEMLMay 27, 2018

Compact and Computationally Efficient Representation of Deep Neural Networks

arXiv:1805.10692v274 citations
Originality Highly original
AI Analysis

This work addresses the efficiency bottleneck in deploying deep neural networks, particularly for resource-constrained environments, by providing a novel method that guarantees improved performance as matrix entropy decreases.

The paper tackles the problem of high computational cost in deep neural network inference by introducing new matrix representations that bound memory and algorithmic complexity by matrix entropy, achieving up to 42x compression, 5x speedup, and 90% energy savings on state-of-the-art networks like AlexNet and ResNet152.

At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes