LGARPFPLMLJul 27, 2018

FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

arXiv:1807.10695v141 citations
Originality Incremental advance
AI Analysis

This work provides a domain-specific solution for efficient CNN inference on FPGAs, though it is incremental with a novel zero-weight-skipping approach.

The paper tackled the problem of accelerating CNN inference by synthesizing a multi-threaded C software program into FPGA hardware, achieving a peak performance of 138 effective GOPS on VGG-16.

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes