ARMay 29
A Reconfigurable Computing In-Memory Macro with Charge-sharing-based Weighted AccumulatorJunyi Yang, Shuai Dong, Zhengnan Fu et al.
SRAM-based analog computing-in-memory demonstrates outstanding efficiency. However, it faces three critical challenges: significant ADC overhead, high latency for multi-bit inputs, and limited read bitline voltage. To address these issues, this work proposes a multi-bit highly reconfigurable 256x128 in-memory computing array supporting 1-7b input, 2-4b weight, and 1-7b output. Three key innovations are introduced: 1) The IMADC occupies only 3% area overhead, achieving a 9x improvement compared to previous IMADC; 2) The BSCHA reduces latency by 1.9x and 6.6x compared to traditional pulse-width modulation (PWM) and bit-slicing modes, respectively; 3) A dual-8T bitcell enabling ternary weight storage through a decoupled read path, integrated with a read wordline under-driven cascode technique, improves linearity of unit discharge current by 7x and increases the usable read bitline voltage by 3.5x.
AIApr 10, 2023
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and SystemsJason Yik, Korneel Van den Berghe, Douwe den Blanken et al. · eth-zurich
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of researchers across industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we outline tasks and guidelines for benchmarks across multiple application domains, and present initial performance baselines across neuromorphic and conventional approaches for both benchmark tracks. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community.
LGNov 19, 2022
Intelligence Processing Units Accelerate Neuromorphic LearningPao-Sheng Vincent Sun, Alexander Titterton, Anjlee Gopiani et al.
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency when performing inference with deep learning workloads. Error backpropagation is presently regarded as the most effective method for training SNNs, but in a twist of irony, when training on modern graphics processing units (GPUs) this becomes more expensive than non-spiking networks. The emergence of Graphcore's Intelligence Processing Units (IPUs) balances the parallelized nature of deep learning workloads with the sequential, reusable, and sparsified nature of operations prevalent when training SNNs. IPUs adopt multi-instruction multi-data (MIMD) parallelism by running individual processing threads on smaller data blocks, which is a natural fit for the sequential, non-vectorized steps required to solve spiking neuron dynamical state equations. We present an IPU-optimized release of our custom SNN Python package, snnTorch, which exploits fine-grained parallelism by utilizing low-level, pre-compiled custom operations to accelerate irregular and sparse data access patterns that are characteristic of training SNN workloads. We provide a rigorous performance assessment across a suite of commonly used spiking neuron models, and propose methods to further reduce training run-time via half-precision training. By amortizing the cost of sequential processing into vectorizable population codes, we ultimately demonstrate the potential for integrating domain-specific accelerators with the next generation of neural networks.
CVFeb 28, 2023
Tracking Fast by Learning Slow: An Event-based Speed Adaptive Hand Tracker Leveraging Knowledge in RGB DomainChuanlin Lan, Ziyuan Yin, Arindam Basu et al.
3D hand tracking methods based on monocular RGB videos are easily affected by motion blur, while event camera, a sensor with high temporal resolution and dynamic range, is naturally suitable for this task with sparse output and low power consumption. However, obtaining 3D annotations of fast-moving hands is difficult for constructing event-based hand-tracking datasets. In this paper, we provided an event-based speed adaptive hand tracker (ESAHT) to solve the hand tracking problem based on event camera. We enabled a CNN model trained on a hand tracking dataset with slow motion, which enabled the model to leverage the knowledge of RGB-based hand tracking solutions, to work on fast hand tracking tasks. To realize our solution, we constructed the first 3D hand tracking dataset captured by an event camera in a real-world environment, figured out two data augment methods to narrow the domain gap between slow and fast motion data, developed a speed adaptive event stream segmentation method to handle hand movements in different moving speeds, and introduced a new event-to-frame representation method adaptive to event streams with different lengths. Experiments showed that our solution outperformed RGB-based as well as previous event-based solutions in fast hand tracking tasks, and our codes and dataset will be publicly available.
NEMar 11
An Event-Driven E-Skin System with Dynamic Binary Scanning and real time SNN ClassificationGaishan Li, Zhengnan Fu, Anubhab Tripathi et al.
This paper presents a novel hardware system for high-speed, event-sparse sampling-based electronic skin (e-skin)that integrates sensing and neuromorphic computing. The system is built around a 16x16 piezoresistive tactile array with front end and introduces a event-based binary scan search strategy to classify the digits. This event-driven strategy achieves a 12.8x reduction in scan counts, a 38.2x data compression rate and a 28.4x equivalent dynamic range, a 99% data sparsity, drastically reducing the data acquisition overhead. The resulting sparse data stream is processed by a multi-layer convolutional spiking neural network (Conv-SNN) implemented on an FPGA, which requires only 65% of the computation and 15.6% of the weight storage relative to a CNN. Despite these significant efficiency gains, the system maintains a high classification accuracy of 92.11% for real-time handwritten digit recognition. Furthermore, a real neuromorphic tactile dataset using Address Event Representation (AER) is constructed. This work demonstrates a fully integrated, event-driven pipeline from analog sensing to neuromorphic classification, offering an efficient solution for robotic perception and human-computer interaction.
LGApr 3, 2025Code
SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural NetworksXinyu Luo, Kecheng Chen, Pao-Sheng Vincent Sun et al.
Spiking Neural Networks (SNNs), as a biologically plausible alternative to Artificial Neural Networks (ANNs), have demonstrated advantages in terms of energy efficiency, temporal processing, and biological plausibility. However, SNNs are highly sensitive to distribution shifts, which can significantly degrade their performance in real-world scenarios. Traditional test-time adaptation (TTA) methods designed for ANNs often fail to address the unique computational dynamics of SNNs, such as sparsity and temporal spiking behavior. To address these challenges, we propose SPike-Aware Consistency Enhancement (SPACE), the first source-free and single-instance TTA method specifically designed for SNNs. SPACE leverages the inherent spike dynamics of SNNs to maximize the consistency of spike-behavior-based local feature maps across augmented versions of a single test sample, enabling robust adaptation without requiring source data. We evaluate SPACE on multiple datasets. Furthermore, SPACE exhibits robust generalization across diverse network architectures, consistently enhancing the performance of SNNs on CNNs, Transformer, and ConvLSTM architectures. Experimental results show that SPACE outperforms state-of-the-art ANN methods while maintaining lower computational cost, highlighting its effectiveness and robustness for SNNs in real-world settings. The code will be available at https://github.com/ethanxyluo/SPACE.
ARMar 11
In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory ComputingShuai Dong, Junyi Yang, Biyan Zhou et al.
In deep networks, operations such as ReLU and hardware-driven clamping often cause activations to accumulate near the edges of the distribution, leading to biased clustering and suboptimal quantization in existing nonlinear (NL) quantization methods. This paper introduces Boundary Suppressed K-Means Quantization (BS-KMQ), a novel NL quantization approach designed to reduce the resolution requirements of analog-to-digital converters (ADCs) in in-memory computing (IMC) systems. By suppressing boundary outliers before clustering, BS-KMQ achieves more balanced and informative NL quantization levels. The resulting NL references are implemented using a reconfigurable in-memory NL-ADC, achieving a 7x area improvement over prior NL-ADC designs. When evaluated on ResNet-18, VGG-16, Inception-V3, and DistilBERT, BS-KMQ achieves at least 3x lower quantization error compared to linear, Lloyd-Max, cumulative distribution function (CDF), and K-means methods. It also improves post-training quantization accuracy by up to 66.8%, 25.4%, 66.6%, and 67.7%, respectively, compared to linear quantization. After low-bit fine-tuning, BS-KMQ maintains competitive accuracy with significantly fewer NL-ADC levels (3/3/4/4b). System-level simulations on ResNet-18 (6/2/3b) demonstrate up to a 4x speedup and 24x energy efficiency improvement over existing IMC accelerators.
NEMay 3
SNNF: An SNN-based Near-Sensor Noise Filter for Dynamic Vision SensorsYahan Yang, Pradeep Kumar Gopalakrishnan, Chang Chip Hong et al.
Dynamic Vision Sensors (DVS) exhibit exceptional dynamic range and low power consumption, making them ideal for edge applications in the Internet of Video Things (IoVT). However, their output is often degraded by spurious Background Activity (BA) noise, leading to unnecessary computational overhead. This paper proposes SNNF, a near-sensor BA noise filter that integrates a compact Event-Based Binary Image (EBBI) representation, a parallel memory architecture, and a single-layer Spiking Neural Network (SNN) classifier. Trained on representative DVS data, the SNN distinguishes signal events from noise with an AUC of 0.89 on standard datasets. The binary-array-based EBBI eliminates timestamp dependency, significantly reducing memory footprint. Moreover, the SNN's spike-based computation replaces power-hungry multipliers with simple accumulation logic and minimizes inter-neuron data width, resulting in an extremely hardware-efficient design. FPGA implementation results show that SNNF reduces memory and logic resources to approximately 11% and 40%, respectively of state-of-the-art filters, while achieving a throughput of 29 Mega events per second (Meps). In a 65 nm CMOS ASIC implementation, SNNF achieves 44.4 Meps with an area and power consumption of only ~13% and <5% of the corresponding ANN-based designs. These results demonstrate that SNNF provides an excellent balance between filtering accuracy and hardware efficiency, making it highly suitable for resource-constrained, near-sensor deployment.
LGDec 26, 2023
Combining SNNs with Filtering for Efficient Neural Decoding in Implantable Brain-Machine InterfacesBiyan Zhou, Pao-Sheng Vincent Sun, Arindam Basu
While it is important to make implantable brain-machine interfaces (iBMI) wireless to increase patient comfort and safety, the trend of increased channel count in recent neural probes poses a challenge due to the concomitant increase in the data rate. Extracting information from raw data at the source by using edge computing is a promising solution to this problem, with integrated intention decoders providing the best compression ratio. Recent benchmarking efforts have shown recurrent neural networks to be the best solution. Spiking Neural Networks (SNN) emerge as a promising solution for resource efficient neural decoding while Long Short Term Memory (LSTM) networks achieve the best accuracy. In this work, we show that combining traditional signal processing techniques, namely signal filtering, with SNNs improve their decoding performance significantly for regression tasks, closing the gap with LSTMs, at little added cost. Results with different filters are shown with Bessel filters providing best performance. Two block-bidirectional Bessel filters have been used--one for low latency and another for high accuracy. Adding the high accuracy variant of the Bessel filters to the output of ANN, SNN and variants provided statistically significant benefits with maximum gains of $\approx 5\%$ and $8\%$ in $R^2$ for two SNN topologies (SNN\_Streaming and SNN\_3D). Our work presents state of the art results for this dataset and paves the way for decoder-integrated-implants of the future.
NEMar 13
SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural NetworksHongyang Shang, Shuai Dong, Yahan Yang et al.
Spiking Neural Networks (SNNs) have emerged as a biologically inspired alternative to conventional deep networks, offering event-driven and energy-efficient computation. However, their throughput remains constrained by the serial update of neuron membrane states. While many hardware accelerators and Compute-in-Memory (CIM) architectures efficiently parallelize the synaptic operation (W x I) achieving O(1) complexity for matrix-vector multiplication, the subsequent state update step still requires O(N) time to refresh all neuron membrane potentials. This mismatch makes state update the dominant latency and energy bottleneck in SNN inference. To address this challenge, we propose an SRAM-based CIM for SNN with Linear Decay Leaky Integrate-and-Fire (LD-LIF) Neuron that co-optimizes algorithm and hardware. At the algorithmic level, we replace the conventional exponential membrane decay with a linear decay approximation, converting costly multiplications into simple additions while accuracy drops only around 1%. At the architectural level, we introduce an in-memory parallel update scheme that performs in-place decay directly within the SRAM array, eliminating the need for global sequential updates. Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1 x to 16.7 x reduction of SOP energy consumption, while providing 15.9 x to 69 x more energy efficiency, with negligible accuracy loss relative to original decay models. This work highlights that beyond accelerating the (W x I) computation, optimizing state-update dynamics within CIM architectures is essential for scalable, low-power, and real-time neuromorphic processing.
LGNov 27, 2025
An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interfaceZhou Biyan, Arindam Basu
The number of simultaneously recorded neurons follows an exponentially increasing trend in implantable brain-machine interfaces (iBMIs). Integrating the neural decoder in the implant is an effective data compression method for future wireless iBMIs. However, the non-stationarity of the system makes the performance of the decoder unreliable. To avoid frequent retraining of the decoder and to ensure the safety and comfort of the iBMI user, continuous learning is essential for real-life applications. Since Deep Spiking Neural Networks (DSNNs) are being recognized as a promising approach for developing resource-efficient neural decoder, we propose continuous learning approaches with Reinforcement Learning (RL) algorithms adapted for DSNNs. Banditron and AGREL are chosen as the two candidate RL algorithms since they can be trained with limited computational resources, effectively addressing the non-stationary problem and fitting the energy constraints of implantable devices. To assess the effectiveness of the proposed methods, we conducted both open-loop and closed-loop experiments. The accuracy of open-loop experiments conducted with DSNN Banditron and DSNN AGREL remains stable over extended periods. Meanwhile, the time-to-target in the closed-loop experiment with perturbations, DSNN Banditron performed comparably to that of DSNN AGREL while achieving reductions of 98% in memory access usage and 99% in the requirements for multiply- and-accumulate (MAC) operations during training. Compared to previous continuous learning SNN decoders, DSNN Banditron requires 98% less computes making it a prime candidate for future wireless iBMI systems.
LGOct 13, 2025
Efficient Edge Test-Time Adaptation via Latent Feature Coordinate CorrectionXinyu Luo, Jie Liu, Kecheng Chen et al.
Edge devices face significant challenges due to limited computational resources and distribution shifts, making efficient and adaptable machine learning essential. Existing test-time adaptation (TTA) methods often rely on gradient-based optimization or batch processing, which are inherently unsuitable for resource-constrained edge scenarios due to their reliance on backpropagation and high computational demands. Gradient-free alternatives address these issues but often suffer from limited learning capacity, lack flexibility, or impose architectural constraints. To overcome these limitations, we propose a novel single-instance TTA method tailored for edge devices (TED), which employs forward-only coordinate optimization in the principal subspace of latent using the covariance matrix adaptation evolution strategy (CMA-ES). By updating a compact low-dimensional vector, TED not only enhances output confidence but also aligns the latent representation closer to the source latent distribution within the latent principal subspace. This is achieved without backpropagation, keeping the model parameters frozen, and enabling efficient, forgetting-free adaptation with minimal memory and computational overhead. Experiments on image classification and keyword spotting tasks across the ImageNet and Google Speech Commands series datasets demonstrate that TED achieves state-of-the-art performance while $\textit{reducing computational complexity by up to 63 times}$, offering a practical and scalable solution for real-world edge applications. Furthermore, we successfully $\textit{deployed TED on the ZYNQ-7020 platform}$, demonstrating its feasibility and effectiveness for resource-constrained edge devices in real-world deployments.
LGMay 9, 2025
Architectural Exploration of Hybrid Neural Decoders for Neuromorphic Implantable BMIVivek Mohan, Biyan Zhou, Zhou Wang et al.
This work presents an efficient decoding pipeline for neuromorphic implantable brain-machine interfaces (Neu-iBMI), leveraging sparse neural event data from an event-based neural sensing scheme. We introduce a tunable event filter (EvFilter), which also functions as a spike detector (EvFilter-SPD), significantly reducing the number of events processed for decoding by 192X and 554X, respectively. The proposed pipeline achieves high decoding performance, up to R^2=0.73, with ANN- and SNN-based decoders, eliminating the need for signal recovery, spike detection, or sorting, commonly performed in conventional iBMI systems. The SNN-Decoder reduces computations and memory required by 5-23X compared to NN-, and LSTM-Decoders, while the ST-NN-Decoder delivers similar performance to an LSTM-Decoder requiring 2.5X fewer resources. This streamlined approach significantly reduces computational and memory demands, making it ideal for low-power, on-implant, or wearable iBMIs.
CRAug 3, 2021
DeepFreeze: Cold Boot Attacks and High Fidelity Model Recovery on Commercial EdgeML DeviceYoo-Seung Won, Soham Chatterjee, Dirmanto Jap et al.
EdgeML accelerators like Intel Neural Compute Stick 2 (NCS) can enable efficient edge-based inference with complex pre-trained models. The models are loaded in the host (like Raspberry Pi) and then transferred to NCS for inference. In this paper, we demonstrate practical and low-cost cold boot based model recovery attacks on NCS to recover the model architecture and weights, loaded from the Raspberry Pi. The architecture is recovered with 100% success and weights with an error rate of 0.04%. The recovered model reports maximum accuracy loss of 0.5% as compared to original model and allows high fidelity transfer of adversarial examples. We further extend our study to other cold boot attack setups reported in the literature with higher error rates leading to accuracy loss as high as 70%. We then propose a methodology based on knowledge distillation to correct the erroneous weights in recovered model, even without access to original training data. The proposed attack remains unaffected by the model encryption features of the OpenVINO and NCS framework.
NEJun 23, 2021
Prospects for Analog Circuits in Deep NetworksShih-Chii Liu, John Paul Strachan, Arindam Basu
Operations typically used in machine learning al-gorithms (e.g. adds and soft max) can be implemented bycompact analog circuits. Analog Application-Specific Integrated Circuit (ASIC) designs that implement these algorithms using techniques such as charge sharing circuits and subthreshold transistors, achieve very high power efficiencies. With the recent advances in deep learning algorithms, focus has shifted to hardware digital accelerator designs that implement the prevalent matrix-vector multiplication operations. Power in these designs is usually dominated by the memory access power of off-chip DRAM needed for storing the network weights and activations. Emerging dense non-volatile memory technologies can help to provide on-chip memory and analog circuits can be well suited to implement the needed multiplication-vector operations coupled with in-computing memory approaches. This paper presents abrief review of analog designs that implement various machine learning algorithms. It then presents an outlook for the use ofanalog circuits in low-power deep network accelerators suitable for edge or tiny machine learning applications.
SPAug 21, 2020
ADIC: Anomaly Detection Integrated Circuit in 65nm CMOS utilizing Approximate ComputingBapi Kar, Pradeep Kumar Gopalakrishnan, Sumon Kumar Bose et al.
In this paper, we present a low-power anomaly detection integrated circuit (ADIC) based on a one-class classifier (OCC) neural network. The ADIC achieves low-power operation through a combination of (a) careful choice of algorithm for online learning and (b) approximate computing techniques to lower average energy. In particular, online pseudoinverse update method (OPIUM) is used to train a randomized neural network for quick and resource efficient learning. An additional 42% energy saving can be achieved when a lighter version of OPIUM method is used for training with the same number of data samples lead to no significant compromise on the quality of inference. Instead of a single classifier with large number of neurons, an ensemble of K base learner approach is chosen to reduce learning memory by a factor of K. This also enables approximate computing by dynamically varying the neural network size based on anomaly detection. Fabricated in 65nm CMOS, the ADIC has K = 7 Base Learners (BL) with 32 neurons in each BL and dissipates 11.87pJ/OP and 3.35pJ/OP during learning and inference respectively at Vdd = 0.75V when all 7 BLs are enabled. Further, evaluated on the NASA bearing dataset, approximately 80% of the chip can be shut down for 99% of the lifetime leading to an energy efficiency of 0.48pJ/OP, an 18.5 times reduction over full-precision computing running at Vdd = 1.2V throughout the lifetime.
CVJul 21, 2020
A Hybrid Neuromorphic Object Tracking and Classification Framework for Real-time SystemsAndres Ussa, Chockalingam Senthil Rajen, Deepak Singla et al.
Deep learning inference that needs to largely take place on the 'edge' is a highly computational and memory intensive workload, making it intractable for low-power, embedded platforms such as mobile nodes and remote security applications. To address this challenge, this paper proposes a real-time, hybrid neuromorphic framework for object tracking and classification using event-based cameras that possess properties such as low-power consumption (5-14 mW) and high dynamic range (120 dB). Nonetheless, unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance. Using a frame-based region proposal method based on the density of foreground events, a hardware-friendly object tracking scheme is implemented using the apparent object velocity while tackling occlusion scenarios. The object track input is converted back to spikes for TrueNorth classification via the energy-efficient deep network (EEDN) pipeline. Using originally collected datasets, we train the TrueNorth model on the hardware track outputs, instead of using ground truth object locations as commonly done, and demonstrate the ability of our system to handle practical surveillance scenarios. As an optional paradigm, to exploit the low latency and asynchronous nature of neuromorphic vision sensors (NVS), we also propose a continuous-time tracker with C++ implementation where each event is processed individually. Thereby, we extensively compare the proposed methodologies to state-of-the-art event-based and frame-based methods for object tracking and classification, and demonstrate the use case of our neuromorphic approach for real-time and embedded applications without sacrificing performance. Finally, we also showcase the efficacy of the proposed system to a standard RGB camera setup when evaluated over several hours of traffic recordings.
CVMay 31, 2020
EBBINNOT: A Hardware Efficient Hybrid Event-Frame Tracker for Stationary Dynamic Vision SensorsVivek Mohan, Deepak Singla, Tarun Pulluri et al.
As an alternative sensing paradigm, dynamic vision sensors (DVS) have been recently explored to tackle scenarios where conventional sensors result in high data rate and processing time. This paper presents a hybrid event-frame approach for detecting and tracking objects recorded by a stationary neuromorphic sensor, thereby exploiting the sparse DVS output in a low-power setting for traffic monitoring. Specifically, we propose a hardware efficient processing pipeline that optimizes memory and computational needs that enable long-term battery powered usage for IoT applications. To exploit the background removal property of a static DVS, we propose an event-based binary image creation that signals presence or absence of events in a frame duration. This reduces memory requirement and enables usage of simple algorithms like median filtering and connected component labeling for denoise and region proposal respectively. To overcome the fragmentation issue, a YOLO inspired neural network based detector and classifier to merge fragmented region proposals has been proposed. Finally, a new overlap based tracker was implemented, exploiting overlap between detections and tracks is proposed with heuristics to overcome occlusion. The proposed pipeline is evaluated with more than 5 hours of traffic recording spanning three different locations on two different neuromorphic sensors (DVS and CeleX) and demonstrate similar performance. Compared to existing event-based feature trackers, our method provides similar accuracy while needing approx 6 times less computes. To the best of our knowledge, this is the first time a stationary DVS based traffic monitoring solution is extensively compared to simultaneously recorded RGB frame-based methods while showing tremendous promise by outperforming state-of-the-art deep learning solutions.
ASApr 16, 2020
Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model TuningJyotibdha Acharya, Arindam Basu
The primary objective of this paper is to build classification models and strategies to identify breathing sound anomalies (wheeze, crackle) for automated diagnosis of respiratory and pulmonary diseases. In this work we propose a deep CNN-RNN model that classifies respiratory sounds based on Mel-spectrograms. We also implement a patient specific model tuning strategy that first screens respiratory patients and then builds patient specific classification models using limited patient data for reliable anomaly detection. Moreover, we devise a local log quantization strategy for model weights to reduce the memory footprint for deployment in memory constrained systems such as wearable devices. The proposed hybrid CNN-RNN model achieves a score of 66.31% on four-class classification of breathing cycles for ICBHI'17 scientific challenge respiratory sound database. When the model is re-trained with patient specific data, it produces a score of 71.81% for leave-one-out validation. The proposed weight quantization technique achieves ~4X reduction in total memory cost without loss of performance. The main contribution of the paper is as follows: Firstly, the proposed model is able to achieve state of the art score on the ICBHI'17 dataset. Secondly, deep learning models are shown to successfully learn domain specific knowledge when pre-trained with breathing data and produce significantly superior performance compared to generalized models. Finally, local log quantization of trained weights is shown to be able to reduce the memory requirement significantly. This type of patient-specific re-training strategy can be very useful in developing reliable long-term automated patient monitoring systems particularly in wearable healthcare solutions.
IVMar 19, 2020
HyNNA: Improved Performance for Neuromorphic Vision Sensor based Surveillance using Hybrid Neural Network ArchitectureDeepak Singla, Soham Chatterjee, Lavanya Ramapantulu et al.
Applications in the Internet of Video Things (IoVT) domain have very tight constraints with respect to power and area. While neuromorphic vision sensors (NVS) may offer advantages over traditional imagers in this domain, the existing NVS systems either do not meet the power constraints or have not demonstrated end-to-end system performance. To address this, we improve on a recently proposed hybrid event-frame approach by using morphological image processing algorithms for region proposal and address the low-power requirement for object detection and classification by exploring various convolutional neural network (CNN) architectures. Specifically, we compare the results obtained from our object detection framework against the state-of-the-art low-power NVS surveillance system and show an improved accuracy of 82.16% from 63.1%. Moreover, we show that using multiple bits does not improve accuracy, and thus, system designers can save power and area by using only single bit event polarity information. In addition, we explore the CNN architecture space for object classification and show useful insights to trade-off accuracy for lower power using lesser memory and arithmetic operations.
ETFeb 27, 2020
Is my Neural Network Neuromorphic? Taxonomy, Recent Trends and Future Directions in Neuromorphic EngineeringSumon Kumar Bose, Jyotibdha Acharya, Arindam Basu
In this paper, we review recent work published over the last 3 years under the umbrella of Neuromorphic engineering to analyze what are the common features among such systems. We see that there is no clear consensus but each system has one or more of the following features:(1) Analog computing (2) Non vonNeumann Architecture and low-precision digital processing (3) Spiking Neural Networks (SNN) with components closely related to biology. We compare recent machine learning accelerator chips to show that indeed analog processing and reduced bit precision architectures have best throughput, energy and area efficiencies. However, pure digital architectures can also achieve quite high efficiencies by just adopting a non von-Neumann architecture. Given the design automation tools for digital hardware design, it raises a question on the likelihood of adoption of analog processing in the near future for industrial designs. Next, we argue about the importance of defining standards and choosing proper benchmarks for the progress of neuromorphic system designs and propose some desired characteristics of such benchmarks. Finally, we show brain-machine interfaces as a potential task that fulfils all the criteria of such benchmarks.
LGDec 4, 2019
ADEPOS: A Novel Approximate Computing Framework for Anomaly Detection Systems and its Implementation in 65nm CMOSSumon Kumar Bose, Bapi Kar, Mohendra Roy et al.
To overcome the energy and bandwidth limitations of traditional IoT systems, edge computing or information extraction at the sensor node has become popular. However, now it is important to create very low energy information extraction or pattern recognition systems. In this paper, we present an approximate computing method to reduce the computation energy of a specific type of IoT system used for anomaly detection (e.g. in predictive maintenance, epileptic seizure detection, etc). Termed as Anomaly Detection Based Power Savings (ADEPOS), our proposed method uses low precision computing and low complexity neural networks at the beginning when it is easy to distinguish healthy data. However, on the detection of anomalies, the complexity of the network and computing precision are adaptively increased for accurate predictions. We show that ensemble approaches are well suited for adaptively changing network size. To validate our proposed scheme, a chip has been fabricated in UMC65nm process that includes an MSP430 microprocessor along with an on-chip switching mode DC-DC converter for dynamic voltage and frequency scaling. Using NASA bearing dataset for machine health monitoring, we show that using ADEPOS we can achieve 8.95X saving of energy along the lifetime without losing any detection accuracy. The energy savings are obtained by reducing the execution time of the neural network on the microprocessor.
CVOct 22, 2019
A low-power end-to-end hybrid neuromorphic framework for surveillance applicationsAndres Ussa, Luca Della Vedova, Vandana Reddy Padala et al.
With the success of deep learning, object recognition systems that can be deployed for real-world applications are becoming commonplace. However, inference that needs to largely take place on the `edge' (not processed on servers), is a highly computational and memory intensive workload, making it intractable for low-power mobile nodes and remote security applications. To address this challenge, this paper proposes a low-power (5W) end-to-end neuromorphic framework for object tracking and classification using event-based cameras that possess desirable properties such as low power consumption (5-14 mW) and high dynamic range (120 dB). Nonetheless, unlike traditional approaches of using event-by-event processing, this work uses a mixed frame and event approach to get energy savings with high performance. Using a frame-based region proposal method based on the density of foreground events, a hardware-friendly object tracking is implemented using the apparent object velocity while tackling occlusion scenarios. For low-power classification of the tracked objects, the event camera is interfaced to IBM TrueNorth, which is time-multiplexed to tackle up to eight instances for a traffic monitoring application. The frame-based object track input is converted back to spikes for Truenorth classification via the energy efficient deep network (EEDN) pipeline. Using originally collected datasets, we train the TrueNorth model on the hardware track outputs, instead of using ground truth object locations as commonly done, and demonstrate the efficacy of our system to handle practical surveillance scenarios. Finally, we compare the proposed methodologies to state-of-the-art event-based systems for object tracking and classification, and demonstrate the use case of our neuromorphic approach for low-power applications without sacrificing on performance.
CVOct 4, 2019
EBBIOT: A Low-complexity Tracking Algorithm for Surveillance in IoVT Using Stationary Neuromorphic Vision SensorsJyotibdha Acharya, Andres Ussa Caycedo, Vandana Reddy Padala et al.
In this paper, we present EBBIOT-a novel paradigm for object tracking using stationary neuromorphic vision sensors in low-power sensor nodes for the Internet of Video Things (IoVT). Different from fully event based tracking or fully frame based approaches, we propose a mixed approach where we create event-based binary images (EBBI) that can use memory efficient noise filtering algorithms. We exploit the motion triggering aspect of neuromorphic sensors to generate region proposals based on event density counts with >1000X less memory and computes compared to frame based approaches. We also propose a simple overlap based tracker (OT) with prediction based handling of occlusion. Our overall approach requires 7X less memory and 3X less computations than conventional noise filtering and event based mean shift (EBMS) tracking. Finally, we show that our approach results in significantly higher precision and recall compared to EBMS approach as well as Kalman Filter tracker when evaluated over 1.1 hours of traffic recordings at two different locations.
NEFeb 26, 2019
Spiking Neural Network based Region Proposal Networks for Neuromorphic Vision SensorsJyotibdha Acharya, Vandana Padala, Arindam Basu
This paper presents a three layer spiking neural network based region proposal network operating on data generated by neuromorphic vision sensors. The proposed architecture consists of refractory, convolution and clustering layers designed with bio-realistic leaky integrate and fire (LIF) neurons and synapses. The proposed algorithm is tested on traffic scene recordings from a DAVIS sensor setup. The performance of the region proposal network has been compared with event based mean shift algorithm and is found to be far superior (~50% better) in recall for similar precision (~85%). Computational and memory complexity of the proposed method are also shown to be similar to that of event based mean shift
CRDec 13, 2018
A 0.16pJ/bit Recurrent Neural Network Based PUF for Enhanced Machine Learning Atack ResistanceNimesh Shah, Manaar Alam, Durga Prasad Sahoo et al.
Physically Unclonable Function (PUF) circuits are finding widespread use due to increasing adoption of IoT devices. However, the existing strong PUFs such as Arbiter PUFs (APUF) and its compositions are susceptible to machine learning (ML) attacks because the challenge-response pairs have a linear relationship. In this paper, we present a Recurrent-Neural-Network PUF (RNN-PUF) which uses a combination of feedback and XOR function to significantly improve resistance to ML attack, without significant reduction in the reliability. ML attack is also partly reduced by using a shared comparator with offset-cancellation to remove bias and save power. From simulation results, we obtain ML attack accuracy of 62% for different ML algorithms, while reliability stays above 93%. This represents a 33.5% improvement in our Figure-of-Merit. Power consumption is estimated to be 12.3uW with energy/bit of ~ 0.16pJ.
LGOct 19, 2018
A Stacked Autoencoder Neural Network based Automated Feature Extraction Method for Anomaly detection in On-line Condition MonitoringMohendra Roy, Sumon Kumar Bose, Bapi Kar et al.
Condition monitoring is one of the routine tasks in all major process industries. The mechanical parts such as a motor, gear, bearings are the major components of a process industry and any fault in them may cause a total shutdown of the whole process, which may result in serious losses. Therefore, it is very crucial to predict any approaching defects before its occurrence. Several methods exist for this purpose and many research are being carried out for better and efficient models. However, most of them are based on the processing of raw sensor signals, which is tedious and expensive. Recently, there has been an increase in the feature based condition monitoring, where only the useful features are extracted from the raw signals and interpreted for the prediction of the fault. Most of these are handcrafted features, where these are manually obtained based on the nature of the raw data. This of course requires the prior knowledge of the nature of data and related processes. This limits the feature extraction process. However, recent development in the autoencoder based feature extraction method provides an alternative to the traditional handcrafted approaches; however, they have mostly been confined in the area of image and audio processing. In this work, we have developed an automated feature extraction method for on-line condition monitoring based on the stack of the traditional autoencoder and an on-line sequential extreme learning machine(OSELM) network. The performance of this method is comparable to that of the traditional feature extraction approaches. The method can achieve 100% detection accuracy for determining the bearing health states of NASA bearing dataset. The simple design of this method is promising for the easy hardware implementation of Internet of Things(IoT) based prognostics solutions.
NEFeb 25, 2018
Power efficient Spiking Neural Network Classifier based on memristive crossbar network for spike sorting applicationAnand Kumar Mukhopadhyay, Indrajit Chakrabarti, Arindam Basu et al.
In this paper authors have presented a power efficient scheme for implementing a spike sorting module. Spike sorting is an important application in the field of neural signal acquisition for implantable biomedical systems whose function is to map the Neural-spikes (N-spikes) correctly to the neurons from which it originates. The accurate classification is a pre-requisite for the succeeding systems needed in Brain-Machine-Interfaces (BMIs) to give better performance. The primary design constraint to be satisfied for the spike sorter module is low power with good accuracy. There lies a trade-off in terms of power consumption between the on-chip and off-chip training of the N-spike features. In the former case care has to be taken to make the computational units power efficient whereas in the later the data rate of wireless transmission should be minimized to reduce the power consumption due to the transceivers. In this work a 2-step shared training scheme involving a K-means sorter and a Spiking Neural Network (SNN) is elaborated for on-chip training and classification. Also, a low power SNN classifier scheme using memristive crossbar type architecture is compared with a fully digital implementation. The advantage of the former classifier is that it is power efficient while providing comparable accuracy as that of the digital implementation due to the robustness of the SNN training algorithm which has a good tolerance for variation in memristance.
LGMay 3, 2016
VLSI Extreme Learning Machine: A Design Space ExplorationEnyi Yao, Arindam Basu
In this paper, we describe a compact low-power, high performance hardware implementation of the extreme learning machine (ELM) for machine learning applications. Mismatch in current mirrors are used to perform the vector-matrix multiplication that forms the first stage of this classifier and is the most computationally intensive. Both regression and classification (on UCI data sets) are demonstrated and a design space trade-off between speed, power and accuracy is explored. Our results indicate that for a wide set of problems, $σV_T$ in the range of $15-25$mV gives optimal results. An input weight matrix rotation method to extend the input dimension and hidden layer size beyond the physical limits imposed by the chip is also described. This allows us to overcome a major limit imposed on most hardware machine learners. The chip is implemented in a $0.35 μ$m CMOS process and occupies a die area of around 5 mm $\times$ 5 mm. Operating from a $1$ V power supply, it achieves an energy efficiency of $0.47$ pJ/MAC at a classification rate of $31.6$ kHz.
NEApr 19, 2016
An Online Structural Plasticity Rule for Generating Better ReservoirsSubhrajit Roy, Arindam Basu
In this article, a novel neuro-inspired low-resolution online unsupervised learning rule is proposed to train the reservoir or liquid of Liquid State Machine. The liquid is a sparsely interconnected huge recurrent network of spiking neurons. The proposed learning rule is inspired from structural plasticity and trains the liquid through formation and elimination of synaptic connections. Hence, the learning involves rewiring of the reservoir connections similar to structural plasticity observed in biological neural networks. The network connections can be stored as a connection matrix and updated in memory by using Address Event Representation (AER) protocols which are generally employed in neuromorphic systems. On investigating the 'pairwise separation property' we find that trained liquids provide 1.36 $\pm$ 0.18 times more inter-class separation while retaining similar intra-class separation as compared to random liquids. Moreover, analysis of the 'linear separation property' reveals that trained liquids are 2.05 $\pm$ 0.27 times better than random liquids. Furthermore, we show that our liquids are able to retain the 'generalization' ability and 'generality' of random liquids. A memory analysis shows that trained liquids have 83.67 $\pm$ 5.79 ms longer fading memory than random liquids which have shown 92.8 $\pm$ 5.03 ms fading memory for a particular type of spike train inputs. We also throw some light on the dynamics of the evolution of recurrent connections within the liquid. Moreover, compared to 'Separation Driven Synaptic Modification' - a recently proposed algorithm for iteratively refining reservoirs, our learning rule provides 9.30%, 15.21% and 12.52% more liquid separations and 2.8%, 9.1% and 7.9% better classification accuracies for four, eight and twelve class pattern recognition tasks respectively.
ETDec 24, 2015
Hardware Architecture for Large Parallel Array of Random Feature Extractors applied to Image RecognitionAakash Patil, Shanlan Shen, Enyi Yao et al.
We demonstrate a low-power and compact hardware implementation of Random Feature Extractor (RFE) core. With complex tasks like Image Recognition requiring a large set of features, we show how weight reuse technique can allow to virtually expand the random features available from RFE core. Further, we show how to avoid computation cost wasted for propagating "incognizant" or redundant random features. For proof of concept, we validated our approach by using our RFE core as the first stage of Extreme Learning Machine (ELM)--a two layer neural network--and were able to achieve $>97\%$ accuracy on MNIST database of handwritten digits. ELM's first stage of RFE is done on an analog ASIC occupying $5$mm$\times5$mm area in $0.35μ$m CMOS and consuming $5.95$ $μ$J/classify while using $\approx 5000$ effective hidden neurons. The ELM second stage consisting of just adders can be implemented as digital circuit with estimated power consumption of $20.9$ nJ/classify. With a total energy consumption of only $5.97$ $μ$J/classify, this low-power mixed signal ASIC can act as a co-processor in portable electronic gadgets with cameras.
NEDec 4, 2015
An Online Unsupervised Structural Plasticity Algorithm for Spiking Neural NetworksSubhrajit Roy, Arindam Basu
In this article, we propose a novel Winner-Take-All (WTA) architecture employing neurons with nonlinear dendrites and an online unsupervised structural plasticity rule for training it. Further, to aid hardware implementations, our network employs only binary synapses. The proposed learning rule is inspired by spike time dependent plasticity (STDP) but differs for each dendrite based on its activation level. It trains the WTA network through formation and elimination of connections between inputs and synapses. To demonstrate the performance of the proposed network and learning rule, we employ it to solve two, four and six class classification of random Poisson spike time inputs. The results indicate that by proper tuning of the inhibitory time constant of the WTA, a trade-off between specificity and sensitivity of the network can be achieved. We use the inhibitory time constant to set the number of subpatterns per pattern we want to detect. We show that while the percentage of successful trials are 92%, 88% and 82% for two, four and six class classification when no pattern subdivisions are made, it increases to 100% when each pattern is subdivided into 5 or 10 subpatterns. However, the former scenario of no pattern subdivision is more jitter resilient than the later ones.
NEDec 3, 2015
Triplet Spike Time Dependent Plasticity: A floating-gate ImplementationRoshan Gopalakrishnan, Arindam Basu
Synapse plays an important role of learning in a neural network; the learning rules which modify the synaptic strength based on the timing difference between the pre- and post-synaptic spike occurrence is termed as Spike Time Dependent Plasticity (STDP). The most commonly used rule posits weight change based on time difference between one pre- and one post spike and is hence termed doublet STDP (DSTDP). However, D-STDP could not reproduce results of many biological experiments; a triplet STDP (T-STDP) that considers triplets of spikes as the fundamental unit has been proposed recently to explain these observations. This paper describes the compact implementation of a synapse using single floating-gate (FG) transistor that can store a weight in a nonvolatile manner and demonstrate the triplet STDP (T-STDP) learning rule by modifying drain voltages according to triplets of spikes. We describe a mathematical procedure to obtain control voltages for the FG device for T-STDP and also show measurement results from a FG synapse fabricated in TSMC 0.35um CMOS process to support the theory. Possible VLSI implementation of drain voltage waveform generator circuits are also presented with simulation results.
LGSep 22, 2015
A 128 channel Extreme Learning Machine based Neural Decoder for Brain Machine InterfacesYi Chen, Enyi Yao, Arindam Basu
Currently, state-of-the-art motor intention decoding algorithms in brain-machine interfaces are mostly implemented on a PC and consume significant amount of power. A machine learning co-processor in 0.35um CMOS for motor intention decoding in brain-machine interfaces is presented in this paper. Using Extreme Learning Machine algorithm and low-power analog processing, it achieves an energy efficiency of 290 GMACs/W at a classification rate of 50 Hz. The learning in second stage and corresponding digitally stored coefficients are used to increase robustness of the core analog processor. The chip is verified with neural data recorded in monkey finger movements experiment, achieving a decoding accuracy of 99.3% for movement type. The same co-processor is also used to decode time of movement from asynchronous neural spikes. With time-delayed feature dimension enhancement, the classification accuracy can be increased by 5% with limited number of input channels. Further, a sparsity promoting training scheme enables reduction of number of programmable weights by ~2X.
NEJun 17, 2015
Learning Spike time codes through Morphological Learning with Binary SynapsesSubhrajit Roy, Phyo Phyo San, Shaista Hussain et al.
In this paper, a neuron with nonlinear dendrites (NNLD) and binary synapses that is able to learn temporal features of spike input patterns is considered. Since binary synapses are considered, learning happens through formation and elimination of connections between the inputs and the dendritic branches to modify the structure or "morphology" of the NNLD. A morphological learning algorithm inspired by the 'Tempotron', i.e., a recently proposed temporal learning algorithm-is presented in this work. Unlike 'Tempotron', the proposed learning rule uses a technique to automatically adapt the NNLD threshold during training. Experimental results indicate that our NNLD with 1-bit synapses can obtain similar accuracy as a traditional Tempotron with 4-bit synapses in classifying single spike random latency and pair-wise synchrony patterns. Hence, the proposed method is better suited for robust hardware implementation in the presence of statistical variations. We also present results of applying this rule to real life spike classification problems from the field of tactile sensing.
NENov 20, 2014
Hardware-Amenable Structural Learning for Spike-based Pattern Classification using a Simple Model of Active DendritesShaista Hussain, Shih-Chii Liu, Arindam Basu
This paper presents a spike-based model which employs neurons with functionally distinct dendritic compartments for classifying high dimensional binary patterns. The synaptic inputs arriving on each dendritic subunit are nonlinearly processed before being linearly integrated at the soma, giving the neuron a capacity to perform a large number of input-output mappings. The model utilizes sparse synaptic connectivity; where each synapse takes a binary value. The optimal connection pattern of a neuron is learned by using a simple hardware-friendly, margin enhancing learning algorithm inspired by the mechanism of structural plasticity in biological neurons. The learning algorithm groups correlated synaptic inputs on the same dendritic branch. Since the learning results in modified connection patterns, it can be incorporated into current event-based neuromorphic systems with little overhead. This work also presents a branch-specific spike-based version of this structural plasticity rule. The proposed model is evaluated on benchmark binary classification problems and its performance is compared against that achieved using Support Vector Machine (SVM) and Extreme Learning Machine (ELM) techniques. Our proposed method attains comparable performance while utilizing 10 to 50% less computational resources than the other reported techniques.
ETNov 20, 2014
Liquid State Machine with Dendritically Enhanced Readout for Low-power, Neuromorphic VLSI ImplementationsSubhrajit Roy, Amitava Banerjee, Arindam Basu
In this paper, we describe a new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing. Compared to the parallel perceptron architecture trained by the p-delta algorithm, which is the state of the art in terms of performance of readout stages, our readout architecture and learning algorithm can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation. Inspired by the nonlinear properties of dendrites in biological neurons, our readout stage incorporates neurons having multiple dendrites with a lumped nonlinearity. The number of synaptic connections on each branch is significantly lower than the total number of connections from the liquid neurons and the learning algorithm tries to find the best 'combination' of input connections on each branch to reduce the error. Hence, the learning involves network rewiring (NRW) of the readout network similar to structural plasticity observed in its biological counterparts. We show that compared to a single perceptron using analog weights, this architecture for the readout can attain, even by using the same number of binary valued synapses, up to 3.3 times less error for a two-class spike train classification problem and 2.4 times less error for an input rate approximation task. Even with 60 times larger synapses, a group of 60 parallel perceptrons cannot attain the performance of the proposed dendritically enhanced readout. An additional advantage of this method for hardware implementations is that the 'choice' of connectivity can be easily implemented exploiting address event representation (AER) protocols commonly used in current neuromorphic systems where the connection matrix is stored in memory. Also, due to the use of binary synapses, our proposed method is more robust against statistical variations.
NENov 6, 2013
Delay Learning Architectures for Memory and ClassificationShaista Hussain, Arindam Basu, R. Wang et al.
We present a neuromorphic spiking neural network, the DELTRON, that can remember and store patterns by changing the delays of every connection as opposed to modifying the weights. The advantage of this architecture over traditional weight based ones is simpler hardware implementation without multipliers or digital-analog converters (DACs) as well as being suited to time-based computing. The name is derived due to similarity in the learning rule with an earlier architecture called Tempotron. The DELTRON can remember more patterns than other delay-based networks by modifying a few delays to remember the most 'salient' or synchronous part of every spike pattern. We present simulations of memory capacity and classification ability of the DELTRON for different random spatio-temporal spike patterns. The memory capacity for noisy spike patterns and missing spikes are also shown. Finally, we present SPICE simulation results of the core circuits involved in a reconfigurable mixed signal implementation of this architecture.