64.9ITApr 10
A New Class of Geometric Analog Error Correction Codes for Crossbar Based In-Memory ComputingZiyuan Zhu, Changcheng Yuan, Ron M. Roth et al.
Analog error correction codes have been proposed for analog in-memory computing on resistive crossbars, which can accelerate vector-matrix multiplication for machine learning. Unlike traditional communication or storage channels, this setting involves a mixed noise model with small perturbations and outlier errors. A number of analog codes have been proposed for handling a single outlier, and several constructions have also been developed to address multiple outliers. However, the set of available code families remains limited, covering only a narrow range of code lengths and dimensions. In this paper, we study a recently proposed family of geometric codes capable of handling multiple outliers, and develop a geometric analysis that characterizes their m-height profiles.
LGApr 3, 2023
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMsGiacomo Pedretti, John Moon, Pedro Bruel et al.
Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we develop an analog-digital architecture that implements a novel increased precision analog CAM and a programmable chip for inference of state-of-the-art tree-based ML models, such as XGBoost, CatBoost, and others. Thanks to hardware-aware training, X-TIME reaches state-of-the-art accuracy and 119x higher throughput at 9740x lower latency with >150x improved energy efficiency compared with a state-of-the-art GPU for models with up to 4096 trees and depth of 8, with a 19W peak power consumption.
ARNov 29, 2023
RACE-IT: A Reconfigurable Analog Computing Engine for In-Memory Transformer AccelerationLei Zhao, Aishwarya Natarajan, Luca Buonanno et al.
Transformer models represent the cutting edge of Deep Neural Networks (DNNs) and excel in a wide range of machine learning tasks. However, processing these models demands significant computational resources and results in a substantial memory footprint. While In-memory Computing (IMC)offers promise for accelerating Vector-Matrix Multiplications(VMMs) with high computational parallelism and minimal data movement, employing it for other crucial DNN operators remains a formidable task. This challenge is exacerbated by the extensive use of complex activation functions, Softmax, and data-dependent matrix multiplications (DMMuls) within Transformer models. To address this challenge, we introduce a Reconfigurable Analog Computing Engine (RACE) by enhancing Analog Content Addressable Memories (ACAMs) to support broader operations. Based on the RACE, we propose the RACE-IT accelerator (meaning RACE for In-memory Transformers) to enable efficient analog-domain execution of all core operations of Transformer models. Given the flexibility of our proposed RACE in supporting arbitrary computations, RACE-IT is well-suited for adapting to emerging and non-traditional DNN architectures without requiring hardware modifications. We compare RACE-IT with various accelerators. Results show that RACE-IT increases performance by 453x and 15x, and reduces energy by 354x and 122x over the state-of-the-art GPUs and existing Transformer-specific IMC accelerators, respectively.