Exponential discretization of weights of neural network connections in pre-trained neural networks
This work addresses memory and speed bottlenecks for deploying neural networks, but it is incremental as it builds on existing discretization methods.
The paper tackles the problem of reducing RAM usage and increasing recognition speed in pre-trained neural networks by discretizing connection weights, finding that exponential discretization achieves the same accuracy with 1-2 fewer bits than linear discretization, with results like VGG-16 achieving 69% top5 accuracy at 3 bits and ResNet50 achieving 84% at 4 bits.
To reduce random access memory (RAM) requirements and to increase speed of recognition algorithms we consider a weight discretization problem for trained neural networks. We show that an exponential discretization is preferable to a linear discretization since it allows one to achieve the same accuracy when the number of bits is 1 or 2 less. The quality of the neural network VGG-16 is already satisfactory (top5 accuracy 69%) in the case of 3 bit exponential discretization. The ResNet50 neural network shows top5 accuracy 84% at 4 bits. Other neural networks perform fairly well at 5 bits (top5 accuracies of Xception, Inception-v3, and MobileNet-v2 top5 were 87%, 90%, and 77%, respectively). At less number of bits, the accuracy decreases rapidly.