CV LGApr 28, 2021

Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification

Mohammed F. Tolba, Huruy Tekle Tesfai, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri

arXiv:2105.02954v11.4

Originality Incremental advance

AI Analysis

This work addresses hardware efficiency for DNNs on IoT edge devices, offering an incremental improvement through hybrid techniques.

The paper tackles the computational and memory intensity of deep neural networks for resource-constrained IoT devices by fusing weight approximation with computation reuse, achieving a 1211.3x reduction in parameters with less than 0.9% accuracy drop on MNIST and CIFAR-10, and saving 54% of adders and multipliers compared to the Row Stationary method.

Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses.

View on arXiv PDF

Similar