LGSep 22, 2017

Computation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design

arXiv:1709.07776v247 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient CNN deployment on embedded systems for applications requiring low-power hardware, though it is incremental as it builds on existing BFP methods.

The paper tackles the challenge of deploying large-scale CNNs on embedded platforms by analyzing block floating point (BFP) arithmetic to reduce hardware costs and data traffic while maintaining accuracy, finding that an 8-bit mantissa in BFP representation induces less than 0.3% accuracy loss without retraining across models like VGG16 and ResNet-50.

The heavy burdens of computation and off-chip traffic impede deploying the large scale convolution neural network on embedded platforms. As CNN is attributed to the strong endurance to computation errors, employing block floating point (BFP) arithmetics in CNN accelerators could save the hardware cost and data traffics efficiently, while maintaining the classification accuracy. In this paper, we verify the effects of word width definitions in BFP to the CNN performance without retraining. Several typical CNN models, including VGG16, ResNet-18, ResNet-50 and GoogLeNet, were tested in this paper. Experiments revealed that 8-bit mantissa, including sign bit, in BFP representation merely induced less than 0.3% accuracy loss. In addition, we investigate the computational errors in theory and develop the noise-to-signal ratio (NSR) upper bound, which provides the promising guidance for BFP based CNN engine design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes