ARLGNov 4, 2022

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

arXiv:2211.02686v111 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the problem of on-device DNN training efficiency for mobile and edge devices, offering an incremental improvement by optimizing batch normalization hardware.

The paper tackles the increasing runtime overhead of batch normalization layers in mobile-friendly DNN training by proposing LightNorm, an efficient hardware module that fuses low bit-precision, range batch normalization, and block floating point techniques, achieving significant area and energy savings without hurting training accuracy.

When training early-stage deep neural networks (DNNs), generating intermediate features via convolution or linear layers occupied most of the execution time. Accordingly, extensive research has been done to reduce the computational burden of the convolution or linear layers. In recent mobile-friendly DNNs, however, the relative number of operations involved in processing these layers has significantly reduced. As a result, the proportion of the execution time of other layers, such as batch normalization layers, has increased. Thus, in this work, we conduct a detailed analysis of the batch normalization layer to efficiently reduce the runtime overhead in the batch normalization process. Backed up by the thorough analysis, we present an extremely efficient batch normalization, named LightNorm, and its associated hardware module. In more detail, we fuse three approximation techniques that are i) low bit-precision, ii) range batch normalization, and iii) block floating point. All these approximate techniques are carefully utilized not only to maintain the statistics of intermediate feature maps, but also to minimize the off-chip memory accesses. By using the proposed LightNorm hardware, we can achieve significant area and energy savings during the DNN training without hurting the training accuracy. This makes the proposed hardware a great candidate for the on-device training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes