A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface
This work addresses the memory wall bottleneck for energy-efficient AI hardware, though it is incremental as it builds on existing SRAM-based charge-domain CiM approaches.
The paper tackles the throughput limitations of SRAM-based charge-domain computing-in-memory for high-performance multi-bit-quantization applications by presenting a macro that completes multiply-accumulate and ReLU operations for 8-bit vectors in one cycle with a single A/D conversion, achieving 51.2 GOPS throughput, 10.3 TOPS/W energy efficiency, and 88.6% accuracy on CIFAR-10.
Performing data-intensive tasks in the von Neumann architecture is challenging to achieve both high performance and power efficiency due to the memory wall bottleneck. Computing-in-memory (CiM) is a promising mitigation approach by enabling parallel in-situ multiply-accumulate (MAC) operations within the memory with support from the peripheral interface and datapath. SRAM-based charge-domain CiM (CD-CiM) has shown its potential of enhanced power efficiency and computing accuracy. However, existing SRAM-based CD-CiM faces scaling challenges to meet the throughput requirement of high-performance multi-bit-quantization applications. This paper presents an SRAM-based high-throughput ReLU-optimized CD-CiM macro. It is capable of completing MAC and ReLU of two signed 8b vectors in one CiM cycle with only one A/D conversion. Along with non-linearity compensation for the analog computing and A/D conversion interfaces, this work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.