NEJan 1
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context TransformersMd Zesun Ahmed Mia, Malyaban Bal, Abhronil Sengupta
The quadratic complexity of self-attention mechanism presents a significant impediment to applying Transformer models to long sequences. This work explores computational principles derived from astrocytes-glial cells critical for biological memory and synaptic modulation-as a complementary approach to conventional architectural modifications for efficient self-attention. We introduce the Recurrent Memory Augmented Astromorphic Transformer (RMAAT), an architecture integrating abstracted astrocyte functionalities. RMAAT employs a recurrent, segment-based processing strategy where persistent memory tokens propagate contextual information. An adaptive compression mechanism, governed by a novel retention factor derived from simulated astrocyte long-term plasticity (LTP), modulates these tokens. Attention within segments utilizes an efficient, linear-complexity mechanism inspired by astrocyte short-term plasticity (STP). Training is performed using Astrocytic Memory Replay Backpropagation (AMRB), a novel algorithm designed for memory efficiency in recurrent networks. Evaluations on the Long Range Arena (LRA) benchmark demonstrate RMAAT's competitive accuracy and substantial improvements in computational and memory efficiency, indicating the potential of incorporating astrocyte-inspired dynamics into scalable sequence models.
ETMay 8
Bayesian Optimization of Crossbar-Based Compute-In-Memory System Design for Efficient DNN InferenceArnob Saha, Bibhas Manna, Nikhil Kotikalapudi et al.
Leveraging the high density and energy efficiency of Compute-In-Memory (CIM) crossbar-based Deep Neural Network (DNN) accelerators requires optimal Design Space Exploration (DSE), which becomes increasingly challenging as complex models for advanced AI workloads expand the highly non-convex design space. Moreover, heterogeneous layer workloads (e.g., memory- vs. compute-bound) and learning representations make layer-wise NN parameter allocation beneficial for efficiency but severely exacerbate the design space complexity by expanding the number of parameters to be tuned for simultaneous multi-objective optimization. Among existing DSE approaches, multi-objective Bayesian Optimization (BO) is promising, as it explores high-quality design solutions while querying costly CIM simulators selectively. In this work, we propose a multi-objective BO framework that holistically co-optimizes hardware and algorithm parameters of a CIM crossbar-based hardware accelerator for various DNN inference tasks. Depending on NN model depth, our framework handles high-dimensional design spaces (with $26$ and $50$ dimensions) and extremely large search complexities on the order of $O(10^{12})$ and $O(10^{27})$ for VGG8/CIFAR-10 and VGG16/Tiny-ImageNet-200. Our method attains $91.72 \%$ and $57.2 \%$ accuracy, respectively, comparable to baseline designs, while improving chip area ($65.52 \%$ and $50.7 \%$), read latency ($9.52 \%$ and $13.27 \%$), read dynamic energy ($31.23 \%$ and $52.07 \%$) and increasing memory utilization ($13.41 \%$ and $2.67 \%$).
NEFeb 12
Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic VisionAnika Tabassum Meem, Muntasir Hossain Nadid, Md Zesun Ahmed Mia
Neuromorphic vision systems based on spiking neural networks (SNNs) offer ultra-low-power perception for event-based and frame-based cameras, yet catastrophic forgetting remains a critical barrier to deployment in continually evolving environments. Existing continual learning methods, developed primarily for artificial neural networks, seldom jointly optimize accuracy and energy efficiency, with particularly limited exploration on event-based datasets. We propose an energy-aware spike budgeting framework for continual SNN learning that integrates experience replay, learnable leaky integrate-and-fire neuron parameters, and an adaptive spike scheduler to enforce dataset-specific energy constraints during training. Our approach exhibits modality-dependent behavior: on frame-based datasets (MNIST, CIFAR-10), spike budgeting acts as a sparsity-inducing regularizer, improving accuracy while reducing spike rates by up to 47\%; on event-based datasets (DVS-Gesture, N-MNIST, CIFAR-10-DVS), controlled budget relaxation enables accuracy gains up to 17.45 percentage points with minimal computational overhead. Across five benchmarks spanning both modalities, our method demonstrates consistent performance improvements while minimizing dynamic power consumption, advancing the practical viability of continual learning in neuromorphic vision systems.
ARApr 8
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer AccelerationMd Zesun Ahmed Mia, Jiahui Duan, Kai Ni et al.
Self-attention in Transformers generates dynamic operands that force conventional Compute-in-Memory (CIM) accelerators into costly non-volatile memory (NVM) reprogramming cycles, degrading throughput and stressing device endurance. Existing solutions either reduce but retain NVM writes through matrix decomposition or sparsity, or move attention computation to digital CMOS at the expense of NVM density. We present TrilinearCIM, a Double-Gate FeFET (DG-FeFET)-based architecture that uses back-gate modulation to realize a three-operand multiply-accumulate primitive for in-memory attention computation without dynamic ferroelectric reprogramming. Evaluated on BERT-base (GLUE) and ViT-base (ImageNet and CIFAR), TrilinearCIM outperforms conventional CIM on seven of nine GLUE tasks while achieving up to 46.6\% energy reduction and 20.4\% latency improvement over conventional FeFET CIM at 37.3\% area overhead. To our knowledge, this is the first architecture to perform complete Transformer attention computation exclusively in NVM cores without runtime reprogramming.
NEDec 18, 2023
Delving Deeper Into Astromorphic TransformersMd Zesun Ahmed Mia, Malyaban Bal, Abhronil Sengupta
Preliminary attempts at incorporating the critical role of astrocytes - cells that constitute more than 50\% of human brain cells - in brain-inspired neuromorphic computing remain in infancy. This paper seeks to delve deeper into various key aspects of neuron-synapse-astrocyte interactions to mimic self-attention mechanisms in Transformers. The cross-layer perspective explored in this work involves bioplausible modeling of Hebbian and presynaptic plasticities in neuron-astrocyte networks, incorporating effects of non-linearities and feedback along with algorithmic formulations to map the neuron-astrocyte computations to self-attention mechanism and evaluating the impact of incorporating bio-realistic effects from the machine learning application side. Our analysis on sentiment and image classification tasks (IMDB and CIFAR10 datasets) highlights the advantages of Astromorphic Transformers, offering improved accuracy and learning speed. Furthermore, the model demonstrates strong natural language generation capabilities on the WikiText-2 dataset, achieving better perplexity compared to conventional models, thus showcasing enhanced generalization and stability across diverse machine learning tasks.
LGAug 6, 2025
Neuromorphic Cybersecurity with Semi-supervised Lifelong LearningMd Zesun Ahmed Mia, Malyaban Bal, Sen Lu et al.
Inspired by the brain's hierarchical processing and energy efficiency, this paper presents a Spiking Neural Network (SNN) architecture for lifelong Network Intrusion Detection System (NIDS). The proposed system first employs an efficient static SNN to identify potential intrusions, which then activates an adaptive dynamic SNN responsible for classifying the specific attack type. Mimicking biological adaptation, the dynamic classifier utilizes Grow When Required (GWR)-inspired structural plasticity and a novel Adaptive Spike-Timing-Dependent Plasticity (Ad-STDP) learning rule. These bio-plausible mechanisms enable the network to learn new threats incrementally while preserving existing knowledge. Tested on the UNSW-NB15 benchmark in a continual learning setting, the architecture demonstrates robust adaptation, reduced catastrophic forgetting, and achieves $85.3$\% overall accuracy. Furthermore, simulations using the Intel Lava framework confirm high operational sparsity, highlighting the potential for low-power deployment on neuromorphic hardware.