AR LGApr 18, 2023

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

Mohammed E. Elbtity, Brendan Reidy, Md Hasibul Amin, Ramtin Zand

arXiv:2304.09258v15.110 citationsh-index: 15

Originality Incremental advance

AI Analysis

This addresses hardware bottlenecks for mobile CNN applications like edge computing, offering incremental improvements in energy efficiency and performance.

The paper tackles the inefficiency of tensor processing units (TPUs) in fully connected layers of convolutional neural networks by integrating an in-memory analog computing (IMAC) unit with an edge TPU, achieving up to 2.59x performance improvements and 88% memory reductions while maintaining comparable accuracy.

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in convolutional neural networks (CNNs). However, they struggle to maintain the same efficiency in fully connected (FC) layers, leading to suboptimal hardware utilization. In-memory analog computing (IMAC) architectures, on the other hand, have demonstrated notable speedup in executing FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. To leverage the strengths of TPUs for convolutional layers and IMAC circuits for dense layers, we propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture. The simulations demonstrate that the TPU-IMAC configuration achieves up to $2.59\times$ performance improvements, and $88\%$ memory reductions compared to conventional TPU architectures for various CNN models while maintaining comparable accuracy. The TPU-IMAC architecture shows potential for various applications where energy efficiency and high performance are essential, such as edge computing and real-time processing in mobile devices. The unified training algorithm and the integration of IMAC and TPU architectures contribute to the potential impact of this research on the broader machine learning landscape.

View on arXiv PDF

Similar