ARMay 11, 2022
Process, Bias and Temperature Scalable CMOS Analog Computing Circuits for Machine LearningPratik Kumar, Ankita Nandi, Shantanu Chakrabartty et al.
Analog computing is attractive compared to digital computing due to its potential for achieving higher computational density and higher energy efficiency. However, unlike digital circuits, conventional analog computing circuits cannot be easily mapped across different process nodes due to differences in transistor biasing regimes, temperature variations and limited dynamic range. In this work, we generalize the previously reported margin-propagation-based analog computing framework for designing novel \textit{shape-based analog computing} (S-AC) circuits that can be easily cross-mapped across different process nodes. Similar to digital designs S-AC designs can also be scaled for precision, speed, and power. As a proof-of-concept, we show several examples of S-AC circuits implementing mathematical functions that are commonly used in machine learning (ML) architectures. Using circuit simulations we demonstrate that the circuit input/output characteristics remain robust when mapped from a planar CMOS 180nm process to a FinFET 7nm process. Also, using benchmark datasets we demonstrate that the classification accuracy of a S-AC based neural network remains robust when mapped across the two processes and to changes in temperature.
ETJun 27, 2022
On-device Synaptic Memory Consolidation using Fowler-Nordheim Quantum-tunnelingMustafizur Rahman, Subhankar Bose, Shantanu Chakrabartty
Synaptic memory consolidation has been heralded as one of the key mechanisms for supporting continual learning in neuromorphic Artificial Intelligence (AI) systems. Here we report that a Fowler-Nordheim (FN) quantum-tunneling device can implement synaptic memory consolidation similar to what can be achieved by algorithmic consolidation models like the cascade and the elastic weight consolidation (EWC) models. The proposed FN-synapse not only stores the synaptic weight but also stores the synapse's historical usage statistic on the device itself. We also show that the operation of the FN-synapse is near-optimal in terms of the synaptic lifetime and we demonstrate that a network comprising FN-synapses outperforms a comparable EWC network for a small benchmark continual learning task. With an energy footprint of femtojoules per synaptic update, we believe that the proposed FN-synapse provides an ultra-energy-efficient approach for implementing both synaptic memory consolidation and persistent learning.
LGApr 24, 2023
Multiplierless In-filter Computing for tinyML PlatformsAbhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty et al.
Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter acoustic classification using Margin Propagation (MP) approximation used in low-power edge devices deployable in remote areas with limited connectivity. The entire design of this classification framework is based on template-based kernel machine, which include feature extraction and inference, and uses basic primitives like addition/subtraction, shift, and comparator operations, for hardware implementation. Unlike full precision training methods for traditional classification, we use MP-based approximation for training, including backpropagation mitigating approximation errors. The proposed framework is general enough for acoustic classification. However, we demonstrate the hardware friendliness of this framework by implementing a parallel Finite Impulse Response (FIR) filter bank in a kernel machine classifier optimized for a Field Programmable Gate Array (FPGA). The FIR filter acts as the feature extractor and non-linear kernel for the kernel machine implemented using MP approximation and a downsampling method to reduce the order of the filters. The FPGA implementation on Spartan 7 shows that the MP-approximated in-filter kernel machine is more efficient than traditional classification frameworks with just less than 1K slices.
LGApr 18, 2023
A Framework for Analyzing Cross-correlators using Price's Theorem and Piecewise-Linear DecompositionZhili Xiao, Shantanu Chakrabartty
Precise estimation of cross-correlation or similarity between two random variables lies at the heart of signal detection, hyperdimensional computing, associative memories, and neural networks. Although a vast literature exists on different methods for estimating cross-correlations, the question what is the best and simplest method to estimate cross-correlations using finite samples ? is still unclear. In this paper, we first argue that the standard empirical approach might not be the optimal method even though the estimator exhibits uniform convergence to the true cross-correlation. Instead, we show that there exists a large class of simple non-linear functions that can be used to construct cross-correlators with a higher signal-to-noise ratio (SNR). To demonstrate this, we first present a general mathematical framework using Price's Theorem that allows us to analyze cross-correlators constructed using a mixture of piece-wise linear functions. Using this framework and high-dimensional embedding, we show that some of the most promising cross-correlators are based on Huber's loss functions, margin-propagation (MP) functions, and the log-sum-exp (LSE) functions.
LGFeb 21, 2024
Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memoryZihao Chen, Faiek Ahsan, Johannes Leugering et al.
Neuromorphic or neurally-inspired optimizers rely on local but parallel parameter updates to solve problems that range from quadratic programming to Ising machines. An ideal realization of such an optimizer not only uses a compute-in-memory (CIM) paradigm to address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access), but also uses a learning-in-memory (LIM) paradigm to address the energy bottlenecks due to repeated memory writes at the precision required for optimization (the update-wall), and to address the energy bottleneck due to the repeated transfer of information between short-term and long-term memories (the consolidation-wall). In this paper, we derive theoretical estimates for the energy-to-solution metric that can be achieved by this ideal neuromorphic optimizer which is realized by modulating the energy-barrier of the physical memories such that the dynamics of memory updates and memory consolidation matches the optimization or the annealing dynamics. The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution. To show the practical applicability of our results, we apply our analysis for estimating the lower-bound on the energy-to-solution metrics for large-scale AI workloads.
NEJun 24, 2025
Higher-Order Neuromorphic Ising Machines -- Autoencoders and Fowler-Nordheim Annealers are all you need for ScalabilityFaiek Ahsan, Saptarshi Maiti, Zihao Chen et al.
We report a higher-order neuromorphic Ising machine that exhibits superior scalability compared to architectures based on quadratization, while also achieving state-of-the-art quality and reliability in solutions with competitive time-to-solution metrics. At the core of the proposed machine is an asynchronous autoencoder architecture that captures higher-order interactions by directly manipulating Ising clauses instead of Ising spins, thereby maintaining resource complexity independent of interaction order. Asymptotic convergence to the Ising ground state is ensured by sampling the autoencoder latent space defined by the spins, based on the annealing dynamics of the Fowler-Nordheim quantum mechanical tunneling. To demonstrate the advantages of the proposed higher-order neuromorphic Ising machine, we systematically solved benchmark combinatorial optimization problems such as MAX-CUT and MAX-SAT, comparing the results to those obtained using a second-order Ising machine employing the same annealing process. Our findings indicate that the proposed architecture consistently provides higher quality solutions in shorter time frames compared to the second-order model across multiple runs. Additionally, we show that the techniques based on the sparsity of the interconnection matrix, such as graph coloring, can be effectively applied to higher-order neuromorphic Ising machines, enhancing the solution quality and the time-to-solution. The time-to-solution can be further improved through hardware co-design, as demonstrated in this paper using a field-programmable gate array (FPGA). The results presented in this paper provide further evidence that autoencoders and Fowler-Nordheim annealers are sufficient to achieve reliability and scaling of any-order neuromorphic Ising machines.
SYOct 30, 2024
KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMsAnkita Nandi, Krishil Gandhi, Mahendra Pratap Singh et al.
Diverse computing paradigms have emerged to meet the growing needs for intelligent energy-efficient systems. The Margin Propagation (MP) framework, being one such initiative in the analog computing domain, stands out due to its scalability across biasing conditions, temperatures, and diminishing process technology nodes. However, the lack of digital-like automation tools for designing analog systems (including that of MP analog) hinders their adoption for designing large systems. The inherent scalability and modularity of MP systems present a unique opportunity in this regard. This paper introduces KALAM (toolKit for Automating high-Level synthesis of Analog computing systeMs), which leverages factor graphs as the foundational paradigm for synthesizing MP-based analog computing systems. Factor graphs are the basis of various signal processing tasks and, when coupled with MP, can be used to design scalable and energy-efficient analog signal processors. Using Python scripting language, the KALAM automation flow translates an input factor graph to its equivalent SPICE-compatible circuit netlist that can be used to validate the intended functionality. KALAM also allows the integration of design optimization strategies such as precision tuning, variable elimination, and mathematical simplification. We demonstrate KALAM's versatility for tasks such as Bayesian inference, Low-Density Parity Check (LDPC) decoding, and Artificial Neural Networks (ANN). Simulation results of the netlists align closely with software implementations, affirming the efficacy of our proposed automation tool.
ETFeb 10, 2022
Bias-Scalable Near-Memory CMOS Analog Processor for Machine LearningPratik Kumar, Ankita Nandi, Shantanu Chakrabartty et al.
Bias-scalable analog computing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For instance, ML implementations for server workloads are focused on higher computational throughput for faster training, whereas ML implementations for edge devices are focused on energy-efficient inference. In this paper, we demonstrate the implementation of bias-scalable approximate analog computing circuits using the generalization of the margin-propagation principle called shape-based analog computing (S-AC). The resulting S-AC core integrates several near-memory compute elements, which include: (a) non-linear activation functions; (b) inner-product compute circuits; and (c) a mixed-signal compressive memory, all of which can be scaled for performance or power while preserving its functionality. Using measured results from prototypes fabricated in a 180nm CMOS process, we demonstrate that the performance of computing modules remains robust to transistor biasing and variations in temperature. In this paper, we also demonstrate the effect of bias-scalability and computational accuracy on a simple ML regression task.
ASSep 11, 2021
In-filter Computing For Designing Ultra-light Acoustic Pattern RecognizersAbhishek Ramdas Nair, Shantanu Chakrabartty, Chetan Singh Thakur
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine (SVM). The result of this integration is a template-based SVM whose memory and computational footprint (training and inference) is light enough to be implemented on an FPGA-based IoT platform. While the proposed in-filter computing framework is general enough, in this paper, we demonstrate this concept using a Cascade of Asymmetric Resonator with Inner Hair Cells (CAR-IHC) based acoustic feature extraction algorithm. The complete system has been optimized using time-multiplexing and parallel-pipeline techniques for a Xilinx Spartan 7 series Field Programmable Gate Array (FPGA). We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only ~ 1.5k Look-Up Tables (LUTs) and ~ 2.8k Flip-Flops (FFs), a significant improvement over other approaches.
SDAug 21, 2021
Using growth transform dynamical systems for spatio-temporal data sonificationOindrila Chatterjee, Shantanu Chakrabartty
Sonification, or encoding information in meaningful audio signatures, has several advantages in augmenting or replacing traditional visualization methods for human-in-the-loop decision-making. Standard sonification methods reported in the literature involve either (i) using only a subset of the variables, or (ii) first solving a learning task on the data and then mapping the output to an audio waveform, which is utilized by the end-user to make a decision. This paper presents a novel framework for sonifying high-dimensional data using a complex growth transform dynamical system model where both the learning (or, more generally, optimization) and the sonification processes are integrated together. Our algorithm takes as input the data and optimization parameters underlying the learning or prediction task and combines it with the psychoacoustic parameters defined by the user. As a result, the proposed framework outputs binaural audio signatures that not only encode some statistical properties of the high-dimensional data but also reveal the underlying complexity of the optimization/learning process. Along with extensive experiments using synthetic datasets, we demonstrate the framework on sonifying Electro-encephalogram (EEG) data with the potential for detecting epileptic seizures in pediatric patients.
LGJun 3, 2021
Multiplierless MP-Kernel Machine For Energy-efficient Edge DevicesAbhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty et al.
We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations.
NEApr 13, 2021
An Adaptive Synaptic Array using Fowler-Nordheim Dynamic Analog MemoryDarshit Mehta, Kenji Aono, Shantanu Chakrabartty
In this paper we present a synaptic array that uses dynamical states to implement an analog memory for energy-efficient training of machine learning (ML) systems. Each of the analog memory elements is a micro-dynamical system that is driven by the physics of Fowler-Nordheim (FN) quantum tunneling, whereas the system level learning modulates the state trajectory of the memory ensembles towards the optimal solution. We show that the extrinsic energy required for modulation can be matched to the dynamics of learning and weight decay leading to a significant reduction in the energy-dissipated during ML training. With the energy-dissipation as low as 5 fJ per memory update and a programming resolution up to 14 bits, the proposed synapse array could be used to address the energy-efficiency imbalance between the training and the inference phases observed in artificial intelligence (AI) systems.
CRApr 9, 2021
SPoTKD: A Protocol for Symmetric Key Distribution over Public Channels Using Self-Powered Timekeeping DevicesMustafizur Rahman, Liang Zhou, Shantanu Chakrabartty
In this paper, we propose a novel class of symmetric key distribution protocols that leverages basic security primitives offered by low-cost, hardware chipsets containing millions of synchronized self-powered timers. The keys are derived from the temporal dynamics of a physical, micro-scale time-keeping device which makes the keys immune to any potential side-channel attacks, malicious tampering, or snooping. Using the behavioral model of the self-powered timers, we first show that the derived key-strings can pass the randomness test as defined by the National Institute of Standards and Technology (NIST) suite. The key-strings are then used in two SPoTKD (Self-Powered Timer Key Distribution) protocols that exploit the timer's dynamics as one-way functions: (a) protocol 1 facilitates secure communications between a user and a remote Server, and (b) protocol 2 facilitates secure communications between two users. In this paper, we investigate the security of these protocols under standard model and against different adversarial attacks. Using Monte-Carlo simulations, we also investigate the robustness of these protocols in the presence of real-world operating conditions and propose error-correcting SPoTKD protocols to mitigate these noise-related artifacts.
LGOct 5, 2019
Multiplierless and Sparse Machine Learning based on Margin Propagation NetworksNazreen P. M., Shantanu Chakrabartty, Chetan Singh Thakur
The new generation of machine learning processors have evolved from multi-core and parallel architectures that were designed to efficiently implement matrix-vector-multiplications (MVMs). This is because at the fundamental level, neural network and machine learning operations extensively use MVM operations and hardware compilers exploit the inherent parallelism in MVM operations to achieve hardware acceleration on GPUs and FPGAs. However, many IoT and edge computing platforms require embedded ML devices close to the network in order to compensate for communication cost and latency. Hence a natural question to ask is whether MVM operations are even necessary to implement ML algorithms and whether simpler hardware primitives can be used to implement an ultra-energy-efficient ML processor/architecture. In this paper we propose an alternate hardware-software codesign of ML and neural network architectures where instead of using MVM operations and non-linear activation functions, the architecture only uses simple addition and thresholding operations to implement inference and learning. At the core of the proposed approach is margin-propagation (MP) based computation that maps multiplications into additions and additions into a dynamic rectifying-linear-unit (ReLU) operations. This mapping results in significant improvement in computational and hence energy cost. In this paper, we show how the MP network formulation can be applied for designing linear classifiers, shallow multi-layer perceptrons and support vector networks suitable fot IoT platforms and tiny ML applications. We show that these MP based classifiers give comparable results to that of their traditional counterparts for benchmark UCI datasets, with the added advantage of reduction in computational complexity enabling an improvement in energy efficiency.
LGAug 15, 2019
Resonant Machine Learning Based on Complex Growth Transform Dynamical SystemsOindrila Chatterjee, Shantanu Chakrabartty
Traditional energy-based learning models associate a single energy metric to each configuration of variables involved in the underlying optimization process. Such models associate the lowest energy state to the optimal configuration of variables under consideration, and are thus inherently dissipative. In this paper we propose an energy-efficient learning framework that exploits structural and functional similarities between a machine learning network and a general electrical network satisfying the Tellegen's theorem. In contrast to the standard energy-based models, the proposed formulation associates two energy components, namely, active and reactive energy to the network. This ensures that the network's active-power is dissipated only during the process of learning, whereas the reactive-power is maintained to be zero at all times. As a result, in steady-state, the learned parameters are stored and self-sustained by electrical resonance determined by the network's nodal inductances and capacitances. Based on this approach, this paper introduces three novel concepts: (a) A learning framework where the network's active-power dissipation is used as a regularization for a learning objective function that is subjected to zero total reactive-power constraint; (b) A dynamical system based on complex-domain, continuous-time growth transforms which optimizes the learning objective function and drives the network towards electrical resonance under steady-state operation; and (c) An annealing procedure that controls the trade-off between active-power dissipation and the speed of convergence. As a representative example, we show how the proposed framework can be used for designing resonant support vector machines (SVMs), where we show that the support-vectors correspond to an LC network with self-sustained oscillations.
NENov 5, 2018
A Unified Perspective of Evolutionary Game Dynamics Using Generalized Growth TransformsOindrila Chatterjee, Shantanu Chakrabartty
In this paper, we show that different types of evolutionary game dynamics are, in principle, special cases of a dynamical system model based on our previously reported framework of generalized growth transforms. The framework shows that different dynamics arise as a result of minimizing a population energy such that the population as a whole evolves to reach the most stable state. By introducing a population dependent time-constant in the generalized growth transform model, the proposed framework can be used to explain a vast repertoire of evolutionary dynamics, including some novel forms of game dynamics with non-linear payoffs.
CRNov 24, 2017
A Fowler-Nordheim Integrator can Track the Density of Prime NumbersLiang Zhou, SriHarsha Kondapalli, Shantanu Chakrabartty
"Does there exist a naturally occurring counting device that might elucidate the hidden structure of prime numbers ?" is a question that has fascinated computer scientists and mathematical physicists for decades. While most recent research in this area have explored the role of the Riemann zeta-function in different formulations of statistical mechanics, condensed matter physics and quantum chaotic systems, the resulting devices (quantum or classical) have only existed in theory or the fabrication of the device has been found to be not scalable to large prime numbers. Here we report for the first time that any hypothetical prime number generator, to our knowledge, has to be a special case of a dynamical system that is governed by the physics of Fowler-Nordheim (FN) quantum-tunneling. In this paper we report how such a dynamical system can be implemented using a counting process that naturally arises from sequential FN tunneling and integration of electrons on a floating-gate (FG) device. The self-compensating physics of the FG device makes the operation reliable and repeatable even when tunneling-currents approach levels below 1 attoamperes. We report measured results from different variants of fabricated prototypes, each of which shows an excellent match with the asymptotic prime number statistics. We also report similarities between the spectral signatures produced by the FN device and the spectral statistics of a hypothetical prime number sequence generator. We believe that the proposed floating-gate device could have future implications in understanding the process that generates prime numbers with applications in security and authentication.