AR AI ET LGDec 4, 2024

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

Omar Ghazal, Wei Wang, Shahar Kvatinsky, Farhad Merchant, Alex Yakovlev, Rishad Shafik

arXiv:2412.05327v11.2h-index: 35

Originality Incremental advance

AI Analysis

This addresses energy and latency issues for machine learning inference applications, though it is incremental as it builds on existing in-memory computing and Tsetlin machine concepts.

The paper tackles the data bandwidth bottleneck in machine learning by proposing an in-memory computing architecture using Y-Flash technology for coalesced Tsetlin machine inference, achieving 96.3% accuracy on MNIST and up to 2.46X energy efficiency improvements over existing methods.

The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.

View on arXiv PDF

Similar