DCLGMay 31, 2018

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

arXiv:1806.01087v25 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient on-chip training for sparse neural networks, which is incremental as it builds on existing FPGA and sparsity methods.

The paper tackled the problem of training sparse neural networks on FPGAs by developing a parallel and reconfigurable architecture that reduces memory and computational requirements, resulting in a proof-of-concept implementation on an Artix-7 FPGA.

We demonstrate an FPGA implementation of a parallel and reconfigurable architecture for sparse neural networks, capable of on-chip training and inference. The network connectivity uses pre-determined, structured sparsity to significantly reduce complexity by lowering memory and computational requirements. The architecture uses a notion of edge-processing, leading to efficient pipelining and parallelization. Moreover, the device can be reconfigured to trade off resource utilization with training time to fit networks and datasets of varying sizes. The combined effects of complexity reduction and easy reconfigurability enable significantly greater exploration of network hyperparameters and structures on-chip. As proof of concept, we show implementation results on an Artix-7 FPGA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes