Self-Compressing Neural Networks
This addresses the challenge of efficient neural network deployment for applications requiring reduced computational resources, though it appears incremental as it builds on existing compression techniques.
The paper tackles the problem of reducing neural network size to improve execution time, power consumption, bandwidth, and memory footprint by proposing Self-Compression, a method that removes redundant weights and reduces bit representation, achieving floating point accuracy with as few as 3% of bits and 18% of weights remaining.
This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploited readily for efficient training and inference without the need for specialized hardware. We propose Self-Compression: a simple, general method that simultaneously achieves two goals: (1) removing redundant weights, and (2) reducing the number of bits required to represent the remaining weights. This is achieved using a generalized loss function to minimize overall network size. In our experiments we demonstrate floating point accuracy with as few as 3% of the bits and 18% of the weights remaining in the network.