ARETJun 4

FQA: A Full-Space Quantization-Driven Architecture for Hardware-Efficient Piecewise Approximation of Nonlinear Activation Functions

arXiv:2606.056279.6
Predicted impact top 13% in AR · last 90 daysOriginality Incremental advance
AI Analysis

It addresses the need for efficient hardware implementation of activation functions in neural network accelerators, offering significant resource savings.

The paper proposes a full-space quantization-driven architecture (FQA) for hardware-efficient piecewise polynomial approximation of nonlinear activation functions, achieving over 50% reduction in area and power for Sigmoid hardware design compared to state-of-the-art.

In this paper, we propose a full-space quantization-driven architecture (FQA) for the hardware-efficient piecewise polynomial approximations (PPAs) of nonlinear activation functions. FQA comprehensively considers both fractional-bit truncation error and quantization error that cause the deviation of the optimal approximation coefficients. Crucially, FQA can precisely determine and search the complete range of optimal coefficients. Based on the proposed FQA, we develop two distinct hardware implementation schemes to cater to different resource-performance trade-offs. Furthermore, we decouple all the fractional word lengths (FWLs) involved in the calculation process to enable the exploration of superior hardware architectures. To mitigate the increased software computation time caused by the expanded quantization space, we design an acceleration method named TBW (target-guided bisection window) to expedite the piecewise calculation and searching process. Experimental results demonstrate that, compared to existing architectures, FQA can significantly reduce the number of required segments while achieving the optimal Maximum Absolute Error (MAE). For the hardware design of the Sigmoid function, our approach achieves over 50% reduction in area and power consumption compared to the state-of-the-art PPA architecture. Finally, we present a complete design workflow for deploying PPA on configurable hardware, maximizing the utilization of existing hardware resources and minimizing MAE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes