DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format
This addresses the hardware constraints for AI at the edge, such as in robotics and surveillance, by offering a scalable and reliable solution, though it appears incremental as it builds on in-memory computing with a novel data format.
The paper tackles the problem of energy inefficiency in AI hardware for edge applications by proposing DISCA, a digital in-memory stochastic computing architecture, which achieves an energy efficiency of 3.59 TOPS/W per bit at 500 MHz and improves matrix multiplication energy efficiency by orders of magnitude compared to existing architectures.
Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.