LGAIApr 7, 2021

NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

arXiv:2104.05421v117 citations
Originality Highly original
AI Analysis

This addresses the need for efficient, low-latency DNN accelerators in time-critical applications, representing a novel method rather than an incremental improvement.

The paper tackled the problem of achieving ultra-low-latency DNN inference for applications with sub-microsecond requirements by introducing NullaNet Tiny, a framework that replaces expensive DNN operations with Boolean logic mapped to FPGA LUTs, resulting in 2.36× lower latency and 24.42× lower LUT utilization compared to Xilinx's LogicNets at similar accuracy.

While there is a large body of research on efficient processing of deep neural networks (DNNs), ultra-low-latency realization of these models for applications with stringent, sub-microsecond latency requirements continues to be an unresolved, challenging problem. Field-programmable gate array (FPGA)-based DNN accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms considering their performance, flexibility, and energy efficiency. This paper presents NullaNet Tiny, an across-the-stack design and optimization framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators. The key idea is to replace expensive operations required to compute various filter/neuron functions in a DNN with Boolean logic expressions that are mapped to the native look-up tables (LUTs) of the FPGA device (examples of such operations are multiply-and-accumulate and batch normalization). At about the same level of classification accuracy, compared to Xilinx's LogicNets, our design achieves 2.36$\times$ lower latency and 24.42$\times$ lower LUT utilization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes