CRAM-ER: Error-Resilient Spintronic Computational Random Access Memory for Scalable In-Memory Computation
This work addresses the scalability and reliability bottlenecks of CRAM for DNN acceleration, offering a practical solution for energy-efficient in-memory computing.
The paper proposes CRAM-ER, an error-resilient spintronic CRAM architecture for scalable in-memory matrix-vector multiplications, achieving near-lossless DNN accuracy while reducing CRAM latency by up to 2 orders of magnitude and outperforming CPU/GPU+high-bandwidth DRAM in energy efficiency and energy-delay product.
Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.