LGSPHEP-EXMar 11, 2020

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML

arXiv:2003.06308v2112 citations
AI Analysis

This work addresses the challenge of efficient neural network deployment on FPGAs for applications like handwritten digit recognition and particle physics, though it is incremental as it builds on existing hls4ml methods.

The authors tackled the problem of compressing deep neural networks for FPGA deployment by implementing binary and ternary precision in the hls4ml library, achieving similar performance to higher precision models while drastically reducing resource consumption.

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes