61.1ARJun 4
Space-CIM: Enabling Compute-In-Memory Accelerators for Thermally-Constrained Space PlatformsSohan Salahuddin Mugdho, Md. Shahedul Hasan, Cheng Wang
The rapid growth in compute demand from artificial intelligence (AI) has driven a massive surge in data center construction, precipitating an energy and sustainability crisis. Motivated by the abundant solar energy in outer space and the recent sharp reduction in space launch costs, orbital data centers are emerging as a potential pathway for the future scaling of AI compute infrastructure. While the cold background in vacuum seems appealing for cooling, computing systems operating in space without convection ultimately rely on radiative cooling, requiring large-area radiators. Such limitations in thermal management pose a significant challenge for deploying the standard liquid/air-cooled computers in space. In this work, we investigate the impact of the thermal constraints in space on both graphics processing units (GPUs) with high-bandwidth memory (HBM) and the emerging compute-in-memory (CIM) accelerators. We develop a radiator-in-the-loop co-design methodology that directly links the permitted system TOPS (terra-operations per second) with the practical radiator cooling capacity in space. Our thermal simulations reveal that the separately located GPU die and HBMs create severe thermal hotspots under limited radiator capacity, necessitating GPU thermal throttling. In contrast, CIM accelerators exhibit a much more uniform heat distribution and consistently outperform GPUs in TOPS/W across a wide range of radiator budgets. We systematically evaluated the performance of CIM and GPU across various AI workloads and demonstrated that CIM has a magnified advantage for deployment in space under realistic thermal constraints.
23.8ARJun 1
CRAM-ER: Error-Resilient Spintronic Computational Random Access Memory for Scalable In-Memory ComputationSohan Salahuddin Mugdho, Md. Shahedul Hasan, Brahmdutta Dixit et al.
Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.
IRApr 11, 2023
Privacy-Preserving Matrix Factorization for Recommendation Systems using Gaussian MechanismSohan Salahuddin Mugdho, Hafiz Imtiaz
Building a recommendation system involves analyzing user data, which can potentially leak sensitive information about users. Anonymizing user data is often not sufficient for preserving user privacy. Motivated by this, we propose a privacy-preserving recommendation system based on the differential privacy framework and matrix factorization, which is one of the most popular algorithms for recommendation systems. As differential privacy is a powerful and robust mathematical framework for designing privacy-preserving machine learning algorithms, it is possible to prevent adversaries from extracting sensitive user information even if the adversary possesses their publicly available (auxiliary) information. We implement differential privacy via the Gaussian mechanism in the form of output perturbation and release user profiles that satisfy privacy definitions. We employ Rényi Differential Privacy for a tight characterization of the overall privacy loss. We perform extensive experiments on real data to demonstrate that our proposed algorithm can offer excellent utility for some parameter choices, while guaranteeing strict privacy.
ARJul 17, 2024
StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN AcceleratorsEthan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte et al.
Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by the large overhead of the peripheral analog-to-digital converters (ADCs). To address such ADC bottleneck, here we propose to implement stochastic processing of array-level partial sums (PS) for efficient IMC. Leveraging the probabilistic switching of spin-orbit torque magnetic tunnel junctions, the proposed PS processing eliminates the costly ADC, achieving significant improvement in energy and area efficiency. To mitigate accuracy loss, we develop PS-quantization-aware training that enables backward propagation across stochastic PS. Furthermore, a novel scheme with an inhomogeneous sampling length of the stochastic conversion is proposed. When running ResNet20 on the CIFAR-10 dataset, our architecture-to-algorithm co-design demonstrates up to 16x, 8x, and 10x improvement in energy, latency, and area, respectively, compared to IMC with standard ADC. Our optimized design configuration using stochastic PS achieved 130x (24x) improvement in Energy-Delay-Product compared to IMC with full precision ADC (sparse low-bit ADC), while maintaining near-software accuracy at various benchmark classification tasks.