NEApr 17, 2018

DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing

Alberto Delmas, Sayeh Sharify, Patrick Judd, Kevin Siu, Milos Nikolic, Andreas Moshovos

arXiv:1804.06732v39.97 citations

Originality Highly original

AI Analysis

This addresses the problem of high computational and memory costs in deep learning for hardware designers, offering a novel method to improve performance and energy efficiency.

The paper tackles the inefficiency of using uniform precision for all values in deep neural networks by proposing Dynamic Precision Reduction (DPRed), which groups weights and activations and encodes them with group-specific precisions, reducing off-chip traffic to nearly 35% and 33% on average for 16b and 8b models and achieving speedups of up to 2.81x.

We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst case design. Much shorter data types can be used if we target the common case by adjusting the precision at a much finer granularity. We propose Dynamic Precision Reduction (DPRed), where we group weights and activations and encode them using a precision specific to each group. The per group precisions are selected statically for the weights and dynamically by hardware for the activations. We exploit these precisions to reduce: 1) off-chip storage and off- and on-chip communication, and 2) execution time. DPRed compression reduces off-chip traffic to nearly 35% and 33% on average compared to no compression respectively for 16b and 8b models. This makes it possible to sustain higher performance for a given off-chip memory interface while also boosting energy efficiency. We also demonstrate designs where the time required to process each group of activations and/or weights scales proportionally to the precision they use for convolutional and fully-connected layers. This improves execution time and energy efficiency for both dense and sparse networks. We show the techniques work with 8-bit networks, where 1.82x and 2.81x speedups are achieved for two different hardware variants that take advantage of dynamic precision variability.

View on arXiv PDF

Similar