LG DC NEMay 23, 2016

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

Chao Wang, Qi Yu, Lei Gong, Xi Li, Yuan Xie, Xuehai Zhou

arXiv:1605.06894v112.8315 citations

Originality Incremental advance

AI Analysis

This addresses performance and power efficiency issues for deep learning applications, though it is incremental as it builds on existing FPGA acceleration methods.

The paper tackled the challenge of implementing large-scale deep neural networks efficiently by designing DLAU, a scalable FPGA-based accelerator, which achieved up to 36.1x speedup over Intel Core2 processors with a power consumption of 234mW.

As the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses significant challenge to construct a high performance implementations of deep learning neural networks. In order to improve the performance as well to maintain the low power cost, in this paper we design DLAU, which is a scalable accelerator architecture for large-scale deep learning networks using FPGA as the hardware prototype. The DLAU accelerator employs three pipelined processing units to improve the throughput and utilizes tile techniques to explore locality for deep learning applications. Experimental results on the state-of-the-art Xilinx FPGA board demonstrate that the DLAU accelerator is able to achieve up to 36.1x speedup comparing to the Intel Core2 processors, with the power consumption at 234mW.

View on arXiv PDF

Similar