LGApr 3, 2023
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMsGiacomo Pedretti, John Moon, Pedro Bruel et al.
Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we develop an analog-digital architecture that implements a novel increased precision analog CAM and a programmable chip for inference of state-of-the-art tree-based ML models, such as XGBoost, CatBoost, and others. Thanks to hardware-aware training, X-TIME reaches state-of-the-art accuracy and 119x higher throughput at 9740x lower latency with >150x improved energy efficiency compared with a state-of-the-art GPU for models with up to 4096 trees and depth of 8, with a 19W peak power consumption.
ARNov 29, 2023
RACE-IT: A Reconfigurable Analog Computing Engine for In-Memory Transformer AccelerationLei Zhao, Aishwarya Natarajan, Luca Buonanno et al.
Transformer models represent the cutting edge of Deep Neural Networks (DNNs) and excel in a wide range of machine learning tasks. However, processing these models demands significant computational resources and results in a substantial memory footprint. While In-memory Computing (IMC)offers promise for accelerating Vector-Matrix Multiplications(VMMs) with high computational parallelism and minimal data movement, employing it for other crucial DNN operators remains a formidable task. This challenge is exacerbated by the extensive use of complex activation functions, Softmax, and data-dependent matrix multiplications (DMMuls) within Transformer models. To address this challenge, we introduce a Reconfigurable Analog Computing Engine (RACE) by enhancing Analog Content Addressable Memories (ACAMs) to support broader operations. Based on the RACE, we propose the RACE-IT accelerator (meaning RACE for In-memory Transformers) to enable efficient analog-domain execution of all core operations of Transformer models. Given the flexibility of our proposed RACE in supporting arbitrary computations, RACE-IT is well-suited for adapting to emerging and non-traditional DNN architectures without requiring hardware modifications. We compare RACE-IT with various accelerators. Results show that RACE-IT increases performance by 453x and 15x, and reduces energy by 354x and 122x over the state-of-the-art GPUs and existing Transformer-specific IMC accelerators, respectively.
49.2ETMay 14
Accelerating Hybrid XOR$-$CNF Boolean Satisfiability Problems Natively with In-Memory ComputingHaesol Im, Fabian Böhm, Giacomo Pedretti et al.
The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunication, solutions can often be found more efficiently by representing them with a combination of exclusive OR (XOR) and conjunctive normal form (CNF) clauses. We propose a hardware accelerator architecture that natively embeds and solves such hybrid XOR--CNF problems using in-memory computing hardware. To achieve this, we introduce an algorithm and demonstrate, both experimentally and through simulations, how it can be efficiently implemented with memristor crossbar arrays. Compared to the conventional approaches that translate XOR--CNF problems to pure CNF problems, our simulations show that the accelerator improves computation speed, energy efficiency, and chip area utilization of in-memory accelerators by $\sim$10$\times$ for a set of hard cryptographic benchmarking problems. Moreover, the accelerator achieves a $\sim$10$\times$ speedup and a $\sim$1000$\times$ gain in energy efficiency over state-of-the-art SAT solvers running on CPUs.
41.5ETMay 12
A Fast and Energy-Efficient Latch-Based Memristive Analog Content-Addressable MemoryPaul-Philipp Manea, Aishwarya Natarajan, Jim Ignowski et al.
Analog content-addressable memories (aCAMs) based on memristors provide a promising pathway toward energy-efficient large-scale associative computing for Edge AI and embedded intelligence applications. They have been successfully applied to decision-tree inference and extend the capabilities of compute-in-memory (CIM) architectures beyond conventional vector-matrix multiplication. However, conventional designs such as the 6T2M architecture suffer from static search power, limited voltage gain, and pronounced match-line crosstalk, constraining analog precision and scalability. We introduce a strong-arm latched memristor (SALM) aCAM cell that replaces static voltage division with a dynamic current-race comparator, enabling high regenerative gain, intrinsic result latching, and near-zero static search power. Compared to 6T2M, SALM reduces read energy by 33% at identical latency while eliminating the gain and crosstalk limitations that prevent 6T2M from scaling to large arrays. SALM further enables scalable sequential and parallel latch sharing, and a dataset-aware optimization framework exposes an explicit energy-latency tradeoff, achieving up to 50% energy reduction at 3x latency across representative workloads. To enable architectural exploration, we develop a circuit-accurate behavioral model derived from SPICE lookup tables in 22 nm FD-SOI technology, capturing match-line dynamics and crosstalk. Integrated into the X-TIME decision-tree compiler, this framework demonstrates that SALM maintains near-software accuracy for high-dimensional datasets, whereas baseline designs degrade due to limited gain and cumulative crosstalk.
LGMay 19, 2025
Hardware-Adaptive and Superlinear-Capacity Memristor-based Associative MemoryChengping He, Mingrui Jiang, Keyi Shan et al.
Brain-inspired computing aims to mimic cognitive functions like associative memory, the ability to recall complete patterns from partial cues. Memristor technology offers promising hardware for such neuromorphic systems due to its potential for efficient in-memory analog computing. Hopfield Neural Networks (HNNs) are a classic model for associative memory, but implementations on conventional hardware suffer from efficiency bottlenecks, while prior memristor-based HNNs faced challenges with vulnerability to hardware defects due to offline training, limited storage capacity, and difficulty processing analog patterns. Here we introduce and experimentally demonstrate on integrated memristor hardware a new hardware-adaptive learning algorithm for associative memories that significantly improves defect tolerance and capacity, and naturally extends to scalable multilayer architectures capable of handling both binary and continuous patterns. Our approach achieves 3x effective capacity under 50% device faults compared to state-of-the-art methods. Furthermore, its extension to multilayer architectures enables superlinear capacity scaling (\(\propto N^{1.49}\ for binary patterns) and effective recalling of continuous patterns (\propto N^{1.74}\ scaling), as compared to linear capacity scaling for previous HNNs. It also provides flexibility to adjust capacity by tuning hidden neurons for the same-sized patterns. By leveraging the massive parallelism of the hardware enabled by synchronous updates, it reduces energy by 8.8x and latency by 99.7% for 64-dimensional patterns over asynchronous schemes, with greater improvements at scale. This promises the development of more reliable memristor-based associative memory systems and enables new applications research due to the significantly improved capacity, efficiency, and flexibility.