Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing
This work addresses efficiency improvements for hardware accelerators in deep learning, but it is incremental as it builds upon the previous Cnvlutin design.
The paper tackles the problem of reducing memory and energy consumption in deep neural network accelerators by skipping ineffectual activations and weights, resulting in modifications to the Cnvlutin accelerator that include new encodings and memory access strategies.
We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep Learning Network. We first describe different encodings of the activations that are deemed ineffectual. The encodings have different memory overhead and energy characteristics. We propose using a level of indirection when accessing activations from memory to reduce their memory footprint by storing only the effectual activations. We also present a modified organization that detects the activations that are deemed as ineffectual while fetching them from memory. This is different than the original design that instead detected them at the output of the preceding layer. Finally, we present an extended CNV that can also skip ineffectual weights.