Matthew J. Marinella

12papers

260citations

Novelty49%

AI Score41

Ranked #92,696 of 201,326 authors (top 46%)#245 in NE (top 22%)

12 Papers

98.5ETApr 2

A self-heating electrochemical cell with nine decades of programmable linear resistance

Adam L. Gross, Sangheon Oh, Minseong Park et al.

A programmable linear resistor with a compact footprint would have profound implications for microelectronics, enabling efficient in-sensor analog signal processing and in-memory computing. Non-volatile memory offers a potential solution but suffers from limitations due to the programming mechanisms that confine switching to nanoscale constrictions or field-sensitive semiconductor junctions, leading to non-linear current-voltage relationships and errors. Here, we introduce a tunable resistor that is programmed into non-volatile, high-precision resistance states spanning nine orders of magnitude, with linear current-voltage characteristics across the entire range -- significantly improving the performance and widening the application space of resistive memory. A key advance is an electrothermal gate that simultaneously spreads heat and electrochemical reactions during programming to enable large, bulk composition modulation. The volumetric modulation can host thousands of linear resistance states with 100x lower conductance errors than other memory. This enables direct processing of analog signals with high fidelity, and we demonstrate variable-gain amplification, division, and multiplication. Integration with CMOS is used to show resilience to electrical and thermal disturb in arrays and to demonstrate retention of analog levels at <1% average loss for more than 2 months across 100 devices. Simulations indicate matrix multiplication efficiency could approach >1,000 TOPS/W.

MES-HALLNov 22, 2021

Shape-Dependent Multi-Weight Magnetic Artificial Synapses for Neuromorphic Computing

Thomas Leonard, Samuel Liu, Mahshid Alamdar et al.

In neuromorphic computing, artificial synapses provide a multi-weight conductance state that is set based on inputs from neurons, analogous to the brain. Additional properties of the synapse beyond multiple weights can be needed, and can depend on the application, requiring the need for generating different synapse behaviors from the same materials. Here, we measure artificial synapses based on magnetic materials that use a magnetic tunnel junction and a magnetic domain wall. By fabricating lithographic notches in a domain wall track underneath a single magnetic tunnel junction, we achieve 4-5 stable resistance states that can be repeatably controlled electrically using spin orbit torque. We analyze the effect of geometry on the synapse behavior, showing that a trapezoidal device has asymmetric weight updates with high controllability, while a straight device has higher stochasticity, but with stable resistance levels. The device data is input into neuromorphic computing simulators to show the usefulness of application-specific synaptic functions. Implementing an artificial neural network applied on streamed Fashion-MNIST data, we show that the trapezoidal magnetic synapse can be used as a metaplastic function for efficient online learning. Implementing a convolutional neural network for CIFAR-100 image recognition, we show that the straight magnetic synapse achieves near-ideal inference accuracy, due to the stability of its resistance levels. This work shows multi-weight magnetic synapses are a feasible technology for neuromorphic computing and provides design guidelines for emerging artificial synapse technologies.

ARSep 3, 2021

On the Accuracy of Analog Neural Network Inference Accelerators

T. Patrick Xiao, Ben Feinberg, Christopher H. Bennett et al.

Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in mapping neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient.

NEJul 5, 2021

High-Speed CMOS-Free Purely Spintronic Asynchronous Recurrent Neural Network

Pranav O. Mathews, Christian B. Duffee, Abel Thayil et al.

Neuromorphic computing systems overcome the limitations of traditional von Neumann computing architectures. These computing systems can be further improved upon by using emerging technologies that are more efficient than CMOS for neural computation. Recent research has demonstrated memristors and spintronic devices in various neural network designs boost efficiency and speed. This paper presents a biologically inspired fully spintronic neuron used in a fully spintronic Hopfield RNN. The network is used to solve tasks, and the results are compared against those of current Hopfield neuromorphic architectures which use emerging technologies.

MES-HALLJan 8, 2021

Controllable reset behavior in domain wall-magnetic tunnel junction artificial neurons for task-adaptable computation

Samuel Liu, Christopher H. Bennett, Joseph S. Friedman et al.

Neuromorphic computing with spintronic devices has been of interest due to the limitations of CMOS-driven von Neumann computing. Domain wall-magnetic tunnel junction (DW-MTJ) devices have been shown to be able to intrinsically capture biological neuron behavior. Edgy-relaxed behavior, where a frequently firing neuron experiences a lower action potential threshold, may provide additional artificial neuronal functionality when executing repeated tasks. In this study, we demonstrate that this behavior can be implemented in DW-MTJ artificial neurons via three alternative mechanisms: shape anisotropy, magnetic field, and current-driven soft reset. Using micromagnetics and analytical device modeling to classify the Optdigits handwritten digit dataset, we show that edgy-relaxed behavior improves both classification accuracy and classification rate for ordered datasets while sacrificing little to no accuracy for a randomized dataset. This work establishes methods by which artificial spintronic neurons can be flexibly adapted to datasets.

NENov 11, 2020

Domain Wall Leaky Integrate-and-Fire Neurons with Shape-Based Configurable Activation Functions

Wesley H. Brigner, Naimul Hassan, Xuan Hu et al.

Complementary metal oxide semiconductor (CMOS) devices display volatile characteristics, and are not well suited for analog applications such as neuromorphic computing. Spintronic devices, on the other hand, exhibit both non-volatile and analog features, which are well-suited to neuromorphic computing. Consequently, these novel devices are at the forefront of beyond-CMOS artificial intelligence applications. However, a large quantity of these artificial neuromorphic devices still require the use of CMOS, which decreases the efficiency of the system. To resolve this, we have previously proposed a number of artificial neurons and synapses that do not require CMOS for operation. Although these devices are a significant improvement over previous renditions, their ability to enable neural network learning and recognition is limited by their intrinsic activation functions. This work proposes modifications to these spintronic neurons that enable configuration of the activation functions through control of the shape of a magnetic domain wall track. Linear and sigmoidal activation functions are demonstrated in this work, which can be extended through a similar approach to enable a wide variety of activation functions.

NEApr 2, 2020

Device-aware inference operations in SONOS nonvolatile memory arrays

Christopher H. Bennett, T. Patrick Xiao, Ryan Dellana et al.

Non-volatile memory arrays can deploy pre-trained neural network models for edge inference. However, these systems are affected by device-level noise and retention issues. Here, we examine damage caused by these effects, introduce a mitigation strategy, and demonstrate its use in fabricated array of SONOS (Silicon-Oxide-Nitride-Oxide-Silicon) devices. On MNIST, fashion-MNIST, and CIFAR-10 tasks, our approach increases resilience to synaptic noise and drift. We also show strong performance can be realized with ADCs of 5-8 bits precision.

NEMar 24, 2020

Unsupervised Competitive Hardware Learning Rule for Spintronic Clustering Architecture

Alvaro Velasquez, Christopher H. Bennett, Naimul Hassan et al.

We propose a hardware learning rule for unsupervised clustering within a novel spintronic computing architecture. The proposed approach leverages the three-terminal structure of domain-wall magnetic tunnel junction devices to establish a feedback loop that serves to train such devices when they are used as synapses in a neuromorphic computing architecture.

NEMar 4, 2020

Plasticity-Enhanced Domain-Wall MTJ Neural Networks for Energy-Efficient Online Learning

Christopher H. Bennett, T. Patrick Xiao, Can Cui et al.

Machine learning implements backpropagation via abundant training samples. We demonstrate a multi-stage learning system realized by a promising non-volatile memory device, the domain-wall magnetic tunnel junction (DW-MTJ). The system consists of unsupervised (clustering) as well as supervised sub-systems, and generalizes quickly (with few samples). We demonstrate interactions between physical properties of this device and optimal implementation of neuroscience-inspired plasticity learning rules, and highlight performance on a suite of tasks. Our energy analysis confirms the value of the approach, as the learning budget stays below 20 $μJ$ even for large tasks used typically in machine learning.

NEFeb 25, 2020

Evaluating complexity and resilience trade-offs in emerging memory inference machines

Christopher H. Bennett, Ryan Dellana, T. Patrick Xiao et al.

Neuromorphic-style inference only works well if limited hardware resources are maximized properly, e.g. accuracy continues to scale with parameters and complexity in the face of potential disturbance. In this work, we use realistic crossbar simulations to highlight that compact implementations of deep neural networks are unexpectedly susceptible to collapse from multiple system disturbances. Our work proposes a middle path towards high performance and strong resilience utilizing the Mosaics framework, and specifically by re-using synaptic connections in a recurrent neural network implementation that possesses a natural form of noise-immunity.

NEFeb 3, 2020

CMOS-Free Multilayer Perceptron Enabled by Four-Terminal MTJ Device

Wesley H. Brigner, Naimul Hassan, Xuan Hu et al.

Neuromorphic computing promises revolutionary improvements over conventional systems for applications that process unstructured information. To fully realize this potential, neuromorphic systems should exploit the biomimetic behavior of emerging nanodevices. In particular, exceptional opportunities are provided by the non-volatility and analog capabilities of spintronic devices. While spintronic devices have previously been proposed that emulate neurons and synapses, complementary metal-oxide-semiconductor (CMOS) devices are required to implement multilayer spintronic perceptron crossbars. This work therefore proposes a new spintronic neuron that enables purely spintronic multilayer perceptrons, eliminating the need for CMOS circuitry and simplifying fabrication.

ARJul 31, 2017

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Matthew J. Marinella, Sapan Agarwal, Alexander Hsia et al.

Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared to relevant designs using standard digital ReRAM and SRAM operations. It is shown that the analog accelerator has a 270x energy and 540x latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate (MAC). Compared to an SRAM based accelerator, the energy is 430X better and latency is 34X better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.