LGARSYNov 24, 2021

Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator

arXiv:2111.12787v113 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficiently exploring the vast design space of neural architectures and hardware configurations for CNN accelerators, which is significant for researchers and engineers in deep learning and hardware design, though it is incremental as it builds on existing co-design methods.

The paper tackles the challenging optimization problem of algorithm-hardware co-design for CNNs by proposing a three-phase framework that decouples training from design space exploration, uses hardware-friendly neural architecture search, and employs Gaussian process prediction to avoid costly hardware implementation steps. The result is up to 5% higher accuracy and 3x speedup compared to manually-designed models on ImageNet, and 2-6% higher accuracy, 2-26x smaller latency, and 8.5x higher energy efficiency compared to other state-of-the-art co-design frameworks.

Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architecture and hardware design intractable. In this paper, we demonstrate that our proposed approach is capable of locating designs on the Pareto frontier. This capability is enabled by a novel three-phase co-design framework, with the following new features: (a) decoupling DNN training from the design space exploration of hardware architecture and neural architecture, (b) providing a hardware-friendly neural architecture space by considering hardware characteristics in constructing the search cells, (c) adopting Gaussian process to predict accuracy, latency and power consumption to avoid time-consuming synthesis and place-and-route processes. In comparison with the manually-designed ResNet101, InceptionV2 and MobileNetV2, we can achieve up to 5% higher accuracy with up to 3x speed up on the ImageNet dataset. Compared with other state-of-the-art co-design frameworks, our found network and hardware configuration can achieve 2% ~ 6% higher accuracy, 2x ~ 26x smaller latency and 8.5x higher energy efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes