25.6SYApr 9
An Asynchronous Delta Modulator for Spike Encoding in Event-Driven Brain-Machine InterfaceKaushik Lakshmiramanan, Vineeta Nair, Ching-Yi Lin et al.
This paper presents the design and implementation of an asynchronous delta modulator as a spike encoder for event-driven neural recording in a 65nm CMOS process. The proposed neuromorphic front-end converts analog signals into discrete, asynchronous ON and OFF spikes, effectively compressing continuous biopotentials into spike trains compatible with spiking neural networks (SNNs). Its asynchronous operation enables seamless integration with neuromorphic architectures for real-time decoding in closed-loop brain-machine interfaces (BMIs). Measurement results from silicon demonstrate an energy consumption of 60.73 nJ/spike, an F1-score of 80% compared to a behavioral model of the asynchronous delta modulator, and a compact pixel area of 73.45 um $\times$ 73.64 um.
29.4SYApr 22
Design Space Exploration for ReRAM-based Architectures to Address Scaling Non-idealitiesChing-Yi Lin, Sahil Shah
ReRAM-based in-memory computing (IMC) architectures are promising candidates for energy-efficient matrix-vector multiplication. While scaling the size of ReRAM arrays allows for the amortization of power-hungry peripheral circuits like DACs and ADCs, it simultaneously introduces more parasitic along the signal path. Because of these challenges, current design methodologies often lack practical guidelines to balance these effects at early design stage, forcing designers to rely on time-consuming, iterative transistor-level simulations. In this work, we propose a comprehensive framework for design space exploration that enables the selection of optimal array size, ADC resolution, and system frequency without requiring exhaustive simulations. The framework utilizes a specialized testbench to extract parameters from a limited set of representative transistor-level simulations. These parameters are then used to accurately predict the performance of arbitrary architectures. We demonstrate the effectiveness of this framework through two realistic design cases aimed at maximizing energy efficiency (TOPs/s/W). The results show that the framework successfully identifies optimal architectural configurations under strict power and error constraints, providing an efficient path for high-performance IMC design.
LGApr 11, 2025
Low-Bit Integerization of Vision Transformers using Operand Reordering for Efficient HardwareChing-Yi Lin, Sahil Shah
Pre-trained vision transformers have achieved remarkable performance across various visual tasks but suffer from expensive computational and memory costs. While model quantization reduces memory usage by lowering precision, these models still incur significant computational overhead due to the dequantization before matrix operations. In this work, we analyze the computation graph and propose an integerization process based on operation reordering. Specifically, the process delays dequantization until after matrix operations. This enables integerized matrix multiplication and linear module by directly processing the quantized input. To validate our approach, we synthesize the self-attention module of ViT on a systolic array-based hardware. Experimental results show that our low-bit inference reduces per-PE power consumption for linear layer and matrix multiplication, bridging the gap between quantized models and efficient inference.
26.6SYMar 12
Ising-ReRAM: A Low Power Ising Machine ReRAM Crossbar for NP ProblemsEverest Bloomer, Irem Didin, Ching-Yi Lin et al.
Computational workloads are growing exponentially, driving power consumption to unsustainable levels. Efficiently distributing large-scale networks is an NP-Complete problem equivalent to Boolean satisfiability (SAT), making it one of the core challenges in modern computation. To address this, physics and device inspired methods such as Ising systems have been explored for solving SAT more efficiently. In this work, we implement an Ising model equivalence of the 3-SAT problem using a ReRAM crossbar fabricated in the Skywater 130 nm CMOS process. Our ReRAM-based algorithm achieves $91.0\%$ accuracy in matrix representation across iterative reprogramming cycles. Additionally, we establish a foundational energy profile by measuring the energy costs of small sub-matrix structures within the problem space, demonstrating under linear growth trajectory for combining sub-matrices into larger problems. These results demonstrate a promising platform for developing scalable architectures to accelerate NP-Complete problem solving.