ARAIApr 6, 2021

Designing Efficient and High-performance AI Accelerators with Customized STT-MRAM

arXiv:2104.02199v122 citations
Originality Incremental advance
AI Analysis

This addresses the problem of high area and power consumption in AI accelerators for hardware designers, though it is incremental as it builds on existing STT-MRAM technology.

The paper tackles the design of AI accelerators using customized STT-MRAM to improve efficiency, achieving 75% area and 3% power savings compared to SRAM-based implementations with minimal accuracy trade-offs.

In this paper, we demonstrate the design of efficient and high-performance AI/Deep Learning accelerators with customized STT-MRAM and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design methodology of an innovative scratchpad-assisted on-chip STT-MRAM based buffer system for high-performance accelerators. Using analytically derived expression of memory occupancy time of AI model weights and activation maps, the volatility of STT-MRAM is adjusted with process and temperature variation aware scaling of thermal stability factor to optimize the retention time, energy, read/write latency, and area of STT-MRAM. From the analysis of modern AI workloads and accelerator implementation in 14nm technology, we verify the efficacy of our designed AI accelerator with STT-MRAM STT-AI. Compared to an SRAM-based implementation, the STT-AI accelerator achieves 75% area and 3% power savings at iso-accuracy. Furthermore, with a relaxed bit error rate and negligible AI accuracy trade-off, the designed STT-AI Ultra accelerator achieves 75.4%, and 3.5% savings in area and power, respectively over regular SRAM-based accelerators.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes