ARAILGJul 6, 2021

Impact of On-Chip Interconnect on In-Memory Acceleration of Deep Neural Networks

arXiv:2107.02358v125 citations
Originality Incremental advance
AI Analysis

This addresses the critical issue of data movement bottlenecks for hardware designers of DNN accelerators, particularly for edge and high-density applications, though it is incremental as it builds on existing IMC and NoC technologies.

The paper tackles the problem of on-chip interconnect inefficiency in in-memory computing (IMC) accelerators for deep neural networks (DNNs), showing that optimizing interconnect choice can achieve up to 6× improvement in energy-delay-area product for VGG-19 inference compared to state-of-the-art ReRAM-based IMC architectures.

With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions -- one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The increase in connection density increases on-chip data movement, which makes efficient on-chip communication a critical function of the DNN accelerator. The contribution of this work is threefold. First, we illustrate that the point-to-point (P2P)-based interconnect is incapable of handling a high volume of on-chip data movement for DNNs. Second, we evaluate P2P and network-on-chip (NoC) interconnect (with a regular topology such as a mesh) for SRAM- and ReRAM-based in-memory computing (IMC) architectures for a range of DNNs. This analysis shows the necessity for the optimal interconnect choice for an IMC DNN accelerator. Finally, we perform an experimental evaluation for different DNNs to empirically obtain the performance of the IMC architecture with both NoC-tree and NoC-mesh. We conclude that, at the tile level, NoC-tree is appropriate for compact DNNs employed at the edge, and NoC-mesh is necessary to accelerate DNNs with high connection density. Furthermore, we propose a technique to determine the optimal choice of interconnect for any given DNN. In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN. We demonstrate that the interconnect optimization in the IMC architecture results in up to 6$\times$ improvement in energy-delay-area product for VGG-19 inference compared to the state-of-the-art ReRAM-based IMC architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes