YOCO: A Hybrid In-Memory Computing Architecture with 8-bit Sub-PetaOps/W In-Situ Multiply Arithmetic for Large-Scale AI
This addresses the problem of high energy consumption in large-scale AI inference for hardware developers, though it appears incremental as it builds on existing analog in-memory computing concepts.
The paper tackled the challenge of improving energy efficiency and throughput for AI accelerators by introducing YOCO, a hybrid in-memory computing architecture, which achieved up to 123.8 TOPS/W energy efficiency and up to 33.6x throughput improvements over state-of-the-art baselines.
In this paper, we further explore the potential of analog in-memory computing (AiMC) and introduce an innovative artificial intelligence (AI) accelerator architecture named YOCO, featuring three key proposals: (1) YOCO proposes a novel 8-bit in-situ multiply arithmetic (IMA) achieving 123.8 TOPS/W energy-efficiency and 34.9 TOPS throughput through efficient charge-domain computation and timedomain accumulation mechanism. (2) YOCO employs a hybrid ReRAM-SRAM memory structure to balance computational efficiency and storage density. (3) YOCO tailors an IMC-friendly attention computing flow with an efficient pipeline to accelerate the inference of transformer-based AI models. Compared to three SOTA baselines, YOCO on average improves energy efficiency by up to 3.9x-19.9x and throughput by up to 6.8x-33.6x across 10 CNN/transformer models.