WRPN & Apprentice: Methods for Training and Inference using Low-Precision Numerics
This work addresses the challenge of maintaining model accuracy while reducing resource demands for high-performance deep learning, which is crucial for deploying large models in resource-constrained environments, though it appears incremental as it builds on existing low-precision techniques.
The paper tackles the problem of accuracy degradation in deep learning models when using low-precision numerics to reduce compute and memory requirements, and presents three schemes that enable training and efficient inference without hurting accuracy, along with a hardware accelerator to leverage these methods.
Today's high performance deep learning architectures involve large models with numerous parameters. Low precision numerics has emerged as a popular technique to reduce both the compute and memory requirements of these large models. However, lowering precision often leads to accuracy degradation. We describe three schemes whereby one can both train and do efficient inference using low precision numerics without hurting accuracy. Finally, we describe an efficient hardware accelerator that can take advantage of the proposed low precision numerics.