LGAIARCVJan 31, 2023

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

arXiv:2301.13376v14 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient hardware deployment for neural networks, specifically for FPGA-based accelerators, by enabling low-precision accumulation without accuracy loss, though it is incremental as it builds on existing quantization methods.

The paper tackles the problem of numerical overflow in low-precision accumulators for quantized neural networks by introducing a quantization-aware training algorithm that guarantees overflow avoidance, resulting in models that maintain 99.2% of floating-point performance while achieving 98.2% sparsity and a 46.5x compression rate.

We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during training using accumulator bit width bounds that we derive. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline. We then show that this reduction translates to increased design efficiency for custom FPGA-based accelerators. Finally, we show that our algorithm not only constrains weights to fit into an accumulator of user-defined bit width, but also increases the sparsity and compressibility of the resulting weights. Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98.2% sparsity with an estimated compression rate of 46.5x all while maintaining 99.2% of the floating-point performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes