LG NEFeb 14, 2018

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

arXiv:1802.04899v41 citations

AI Analysis

This work addresses the need for efficient hardware acceleration for diverse DNN designs, offering a flexible solution that can optimize performance based on computational loads, though it appears incremental as it builds on existing accelerator concepts with reconfigurability.

The paper tackles the problem of accelerating deep neural network (DNN) learning and inference by proposing a field-programmable accelerator (FProg-DNN) that uses hybrid systolic and non-systolic techniques, achieving over 50x speedup relative to GPUs or TPUs through pipelined architecture and reconfigurable worker allocation per layer.

An accelerator is a specialized integrated circuit designed to perform specific computations faster than if those were performed by CPU or GPU. A Field-Programmable DNN learning and inference accelerator (FProg-DNN) using hybrid systolic and non-systolic techniques, distributed information-control and deep pipelined structure is proposed and its microarchitecture and operation presented here. Reconfigurability attends diverse DNN designs and allows for different number of workers to be assigned to different layers as a function of the relative difference in computational load among layers. The computational delay per layer is made roughly the same along pipelined accelerator structure. VGG-16 and recently proposed Inception Modules are used for showing the flexibility of the FProg-DNN reconfigurability. Special structures were also added for a combination of convolution layer, map coincidence and feedback for state of the art learning with small set of examples, which is the focus of a companion paper by the author (Franca-Neto, 2018). The accelerator described is able to reconfigure from (1) allocating all a DNN computations to a single worker in one extreme of sub-optimal performance to (2) optimally allocating workers per layer according to computational load in each DNN layer to be realized. Due the pipelined architecture, more than 50x speedup is achieved relative to GPUs or TPUs. This speed-up is consequence of hiding the delay in transporting activation outputs from one layer to the next in a DNN behind the computations in the receiving layer. This FProg-DNN concept has been simulated and validated at behavioral-functional level.

View on arXiv PDF

Similar