NE ARSep 29, 2015

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, Warren J. Gross

arXiv:1509.08972v216.7196 citations

Originality Incremental advance

AI Analysis

This work addresses hardware efficiency for DNNs in applications requiring low-power, area-efficient implementations, representing an incremental improvement over prior stochastic computing approaches.

The paper tackles the problem of high area and latency in hardware implementations of deep neural networks by proposing an integral stochastic computing architecture, achieving 45% area reduction, 62% latency reduction, and up to 33% energy savings compared to existing methods.

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.

View on arXiv PDF

Similar