ARDec 27, 2024
IMAGINE: An 8-to-1b 22nm FD-SOI Compute-In-Memory CNN Accelerator With an End-to-End Analog Charge-Based 0.15-8POPS/W Macro Featuring Distribution-Aware Data ReshapingAdrian Kneip, Martin Lefebvre, Pol Maistriaux et al.
Charge-domain compute-in-memory (CIM) SRAMs have recently become an enticing compromise between computing efficiency and accuracy to process sub-8b convolutional neural networks (CNNs) at the edge. Yet, they commonly make use of a fixed dot-product (DP) voltage swing, which leads to a loss in effective ADC bits due to data-dependent clipping or truncation effects that waste precious conversion energy and computing accuracy. To overcome this, we present IMAGINE, a workload-adaptive 1-to-8b CIM-CNN accelerator in 22nm FD-SOI. It introduces a 1152x256 end-to-end charge-based macro with a multi-bit DP based on an input-serial, weight-parallel accumulation that avoids power-hungry DACs. An adaptive swing is achieved by combining a channel-wise DP array split with a linear in-ADC implementation of analog batch-normalization (ABN), obtaining a distribution-aware data reshaping. Critical design constraints are relaxed by including the post-silicon equivalent noise within a CIM-aware CNN training framework. Measurement results showcase an 8b system-level energy efficiency of 40TOPS/W at 0.3/0.6V, with competitive accuracies on MNIST and CIFAR-10. Moreover, the peak energy and area efficiencies of the 187kB/mm2 macro respectively reach up to 0.15-8POPS/W and 2.6-154TOPS/mm2, scaling with the 8-to-1b computing precision. These results exceed previous charge-based designs by 3-to-5x while being the first work to provide linear in-memory rescaling.
NEJun 2, 2021
Bottom-up and top-down approaches for the design of neuromorphic processing systems: Tradeoffs and synergies between natural and artificial intelligenceCharlotte Frenkel, David Bol, Giacomo Indiveri
While Moore's law has driven exponential computing power expectations, its nearing end calls for new avenues for improving the overall system performance. One of these avenues is the exploration of alternative brain-inspired computing architectures that aim at achieving the flexibility and computational efficiency of biological neural processing systems. Within this context, neuromorphic engineering represents a paradigm shift in computing based on the implementation of spiking neural network architectures in which processing and memory are tightly co-located. In this paper, we provide a comprehensive overview of the field, highlighting the different levels of granularity at which this paradigm shift is realized and comparing design approaches that focus on replicating natural intelligence (bottom-up) versus those that aim at solving practical artificial intelligence applications (top-down). First, we present the analog, mixed-signal and digital circuit design styles, identifying the boundary between processing and memory through time multiplexing, in-memory computation, and novel devices. Then, we highlight the key tradeoffs for each of the bottom-up and top-down design approaches, survey their silicon implementations, and carry out detailed comparative analyses to extract design guidelines. Finally, we identify necessary synergies and missing elements required to achieve a competitive advantage for neuromorphic systems over conventional machine-learning accelerators in edge computing applications, and outline the key ingredients for a framework toward neuromorphic intelligence.
NEMay 13, 2020
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based RetinasCharlotte Frenkel, Jean-Didier Legat, David Bol
In an attempt to follow biological information representation and organization principles, the field of neuromorphic engineering is usually approached bottom-up, from the biophysical models to large-scale integration in silico. While ideal as experimentation platforms for cognitive computing and neuroscience, bottom-up neuromorphic processors have yet to demonstrate an efficiency advantage compared to specialized neural network accelerators for real-world problems. Top-down approaches aim at answering this difficulty by (i) starting from the applicative problem and (ii) investigating how to make the associated algorithms hardware-efficient and biologically-plausible. In order to leverage the data sparsity of spike-based neuromorphic retinas for adaptive edge computing and vision applications, we follow a top-down approach and propose SPOON, a 28-nm event-driven CNN (eCNN). It embeds online learning with only 16.8-% power and 11.8-% area overheads with the biologically-plausible direct random target projection (DRTP) algorithm. With an energy per classification of 313nJ at 0.6V and a 0.32-mm$^2$ area for accuracies of 95.3% (on-chip training) and 97.5% (off-chip training) on MNIST, we demonstrate that SPOON reaches the efficiency of conventional machine learning accelerators while embedding on-chip learning and being compatible with event-based sensors, a point that we further emphasize with N-MNIST benchmarking.
MLSep 3, 2019
Learning without feedback: Fixed random learning signals allow for feedforward training of deep neural networksCharlotte Frenkel, Martin Lefebvre, David Bol
While the backpropagation of error algorithm enables deep neural network training, it implies (i) bidirectional synaptic weight transport and (ii) update locking until the forward and backward passes are completed. Not only do these constraints preclude biological plausibility, but they also hinder the development of low-cost adaptive smart sensors at the edge, as they severely constrain memory accesses and entail buffering overhead. In this work, we show that the one-hot-encoded labels provided in supervised classification problems, denoted as targets, can be viewed as a proxy for the error sign. Therefore, their fixed random projections enable a layerwise feedforward training of the hidden layers, thus solving the weight transport and update locking problems while relaxing the computational and memory requirements. Based on these observations, we propose the direct random target projection (DRTP) algorithm and demonstrate that it provides a tradeoff between accuracy and computational cost that is suitable for adaptive edge computing devices.
NEApr 17, 2019
MorphIC: A 65-nm 738k-Synapse/mm$^2$ Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online LearningCharlotte Frenkel, Jean-Didier Legat, David Bol
Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm$^2$ in 65nm CMOS, achieving a high density of 738k synapses/mm$^2$. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.