ARLGMay 2, 2022

Pre-RTL DNN Hardware Evaluator With Fused Layer Support

arXiv:2205.01729v11 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the time-to-market problem for hardware designers developing DNN accelerators, though it is incremental as it builds on existing architectures.

The paper tackles the challenge of lengthy hardware design processes for DNN accelerators by proposing a pre-RTL evaluator that supports both layer-by-layer and fused layer processing, achieving 55.6% memory bandwidth reduction, 36.7% latency improvement, and 49.2% energy reduction with layer fusion.

With the popularity of the deep neural network (DNN), hardware accelerators are demanded for real time execution. However, lengthy design process and fast evolving DNN models make hardware evaluation hard to meet the time to market need. This paper proposes a pre-RTL DNN hardware evaluator that supports conventional layer-by-layer processing as well as the fused layer processing for low external bandwidth requirement. The evaluator supports two state-of-the-art accelerator architectures and finds the best hardware and layer fusion group The experimental results show the layer fusion scheme can achieve 55.6% memory bandwidth reduction, 36.7% latency improvement and 49.2% energy reduction compared with layer-by-layer operation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes