NEDec 12, 2016

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks

Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, Sherief Reda

arXiv:1612.03940v119.8128 citations

Originality Incremental advance

AI Analysis

This work addresses the need for low-power, low-memory neural network solutions for hardware-constrained applications, though it is incremental as it builds on existing quantization research.

The study comprehensively analyzes the impact of precision quantization on neural networks, finding that scaling to lower bit precisions can significantly reduce energy and memory usage with only modest accuracy drops, and suggests reallocating some benefits to increase network size for improved accuracy.

Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has recently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.

View on arXiv PDF

Similar