LGARNov 9, 2023

Exploiting Neural-Network Statistics for Low-Power DNN Inference

arXiv:2311.05557v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses power efficiency for edge-AI inference engines, offering significant improvements but is incremental as it builds on existing specialized compute blocks.

The paper tackles the bottleneck of interconnect and memory power consumption in DNN inference engines by proposing a low-power technique that combines overhead-free coding with statistical analysis of neural network data and parameters, achieving up to 80% reduction in interconnect and memory power and up to 39% additional savings in compute block power with no accuracy loss and negligible hardware cost.

Specialized compute blocks have been developed for efficient DNN execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the interconnect and memory power consumption by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39%. These power improvements are achieved with no loss of accuracy and negligible hardware cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes