DC LGMay 31, 2018

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

Sourya Dey, Diandian Chen, Zongyang Li, Souvik Kundu, Kuan-Wen Huang, Keith M. Chugg, Peter A. Beerel

arXiv:1806.01087v25.95 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient on-chip training for sparse neural networks, which is incremental as it builds on existing FPGA and sparsity methods.

The paper tackled the problem of training sparse neural networks on FPGAs by developing a parallel and reconfigurable architecture that reduces memory and computational requirements, resulting in a proof-of-concept implementation on an Artix-7 FPGA.

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly reduce complexity by lowering memory and computational requirements. The architecture uses a notion of edge-processing, leading to efficient pipelining and parallelization. Moreover, the device can be reconfigured to trade off resource utilization with training time to fit networks and datasets of varying sizes. The combined effects of complexity reduction and easy reconfigurability enable significantly greater exploration of network hyperparameters and structures on-chip. As proof of concept, we show implementation results on an Artix-7 FPGA.

View on arXiv PDF

Similar