Abhishek Moitra

LG
h-index38
20papers
272citations
Novelty53%
AI Score33

20 Papers

CVMar 30, 2023Code
XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing

Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim et al.

The hardware-efficiency and accuracy of Deep Neural Networks (DNNs) implemented on In-memory Computing (IMC) architectures primarily depend on the DNN architecture and the peripheral circuit parameters. It is therefore essential to holistically co-search the network and peripheral parameters to achieve optimal performance. To this end, we propose XPert, which co-searches network architecture in tandem with peripheral parameters such as the type and precision of analog-to-digital converters, crossbar column sharing and the layer-specific input precision using an optimization-based design space exploration. Compared to VGG16 baselines, XPert achieves 10.24x (4.7x) lower EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%) accuracy for CIFAR10 (TinyImagenet) datasets. The code for this paper is available at https://github.com/Intelligent-Computing-Lab-Yale/XPert.

NEOct 24, 2022
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

Abhishek Moitra, Abhiroop Bhattacharjee, Runcong Kuang et al.

SNNs are an active research domain towards energy efficient machine intelligence. Compared to conventional ANNs, SNNs use temporal spike data and bio-plausible neuronal activation functions such as Leaky-Integrate Fire/Integrate Fire (LIF/IF) for data processing. However, SNNs incur significant dot-product operations causing high memory and computation overhead in standard von-Neumann computing platforms. Today, In-Memory Computing (IMC) architectures have been proposed to alleviate the "memory-wall bottleneck" prevalent in von-Neumann architectures. Although recent works have proposed IMC-based SNN hardware accelerators, the following have been overlooked- 1) the adverse effects of crossbar non-ideality on SNN performance due to repeated analog dot-product operations over multiple time-steps, 2) hardware overheads of essential SNN-specific components such as the LIF/IF and data communication modules. To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs. SpikeSim consists of a practical monolithic IMC architecture called SpikeFlow for mapping SNNs. Additionally, the non-ideality computation engine (NICE) and energy-latency-area (ELA) engine performs hardware-realistic evaluation of SpikeFlow-mapped SNNs. Based on 65nm CMOS implementation and experiments on CIFAR10, CIFAR100 and TinyImagenet datasets, we find that the LIF/IF neuronal module has significant area contribution (>11% of the total hardware area). We propose SNN topological modifications leading to 1.24x and 10x reduction in the neuronal module's area and the overall energy-delay-product value, respectively. Furthermore, in this work, we perform a holistic comparison between IMC implemented ANN and SNNs and conclude that lower number of time-steps are the key to achieve higher throughput and energy-efficiency for SNNs compared to 4-bit ANNs.

NEAug 22, 2024
When In-memory Computing Meets Spiking Neural Networks -- A Perspective on Device-Circuit-System-and-Algorithm Co-design

Abhishek Moitra, Abhiroop Bhattacharjee, Yuhang Li et al.

This review explores the intersection of bio-plausible artificial intelligence in the form of Spiking Neural Networks (SNNs) with the analog In-Memory Computing (IMC) domain, highlighting their collective potential for low-power edge computing environments. Through detailed investigation at the device, circuit, and system levels, we highlight the pivotal synergies between SNNs and IMC architectures. Additionally, we emphasize the critical need for comprehensive system-level analyses, considering the inter-dependencies between algorithms, devices, circuit & system parameters, crucial for optimal performance. An in-depth analysis leads to identification of key system-level bottlenecks arising from device limitations which can be addressed using SNN-specific algorithm-hardware co-design techniques. This review underscores the imperative for holistic device to system design space co-exploration, highlighting the critical aspects of hardware and algorithm research endeavors for low-power neuromorphic solutions.

LGApr 11, 2022
MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Abhiroop Bhattacharjee, Yeshwanth Venkatesha, Abhishek Moitra et al.

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)- CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ~3.48x memory-efficiency and ~2.4-3.1x energy-savings compared to conventional multi-task inference in Pipelined task mode.

LGFeb 9, 2023
DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

Duy-Thanh Nguyen, Abhiroop Bhattacharjee, Abhishek Moitra et al.

With ever increasing depth and width in deep neural networks to achieve state-of-the-art performance, deep learning computation has significantly grown, and dot-products remain dominant in overall computation time. Most prior works are built on conventional dot-product where weighted input summation is used to represent the neuron operation. However, another implementation of dot-product based on the notion of angles and magnitudes in the Euclidean space has attracted limited attention. This paper proposes DeepCAM, an inference accelerator built on two critical innovations to alleviate the computation time bottleneck of convolutional neural networks. The first innovation is an approximate dot-product built on computations in the Euclidean space that can replace addition and multiplication with simple bit-wise operations. The second innovation is a dynamic size content addressable memory-based (CAM-based) accelerator to perform bit-wise operations and accelerate the CNNs with a lower computation time. Our experiments on benchmark image recognition datasets demonstrate that DeepCAM is up to 523x and 3498x faster than Eyeriss and traditional CPUs like Intel Skylake, respectively. Furthermore, the energy consumed by our DeepCAM approach is 2.16x to 109x less compared to Eyeriss.

AIAug 22, 2024
TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing

Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim et al.

Due to the high computation overhead of Vision Transformers (ViTs), In-memory Computing architectures are being researched towards energy-efficient deployment in edge-computing scenarios. Prior works have proposed efficient algorithm-hardware co-design and IMC-architectural improvements to improve the energy-efficiency of IMC-implemented ViTs. However, all prior works have neglected the overhead and co-depencence of attention blocks on the accuracy-energy-delay-area of IMC-implemented ViTs. To this end, we propose TReX- an attention-reuse-driven ViT optimization framework that effectively performs attention reuse in ViT models to achieve optimal accuracy-energy-delay-area tradeoffs. TReX optimally chooses the transformer encoders for attention reuse to achieve near iso-accuracy performance while meeting the user-specified delay requirement. Based on our analysis on the Imagenet-1k dataset, we find that TReX achieves 2.3x (2.19x) EDAP reduction and 1.86x (1.79x) TOPS/mm2 improvement with ~1% accuracy drop in case of DeiT-S (LV-ViT-S) ViT models. Additionally, TReX achieves high accuracy at high EDAP reduction compared to state-of-the-art token pruning and weight sharing approaches. On NLP tasks such as CoLA, TReX leads to 2% higher non-ideal accuracy compared to baseline at 1.6x lower EDAP.

CRSep 5, 2023
RobustEdge: Low Power Adversarial Detection for Cloud-Edge Systems

Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim et al.

In practical cloud-edge scenarios, where a resource constrained edge performs data acquisition and a cloud system (having sufficient resources) performs inference tasks with a deep neural network (DNN), adversarial robustness is critical for reliability and ubiquitous deployment. Adversarial detection is a prime adversarial defence technique used in prior literature. However, in prior detection works, the detector is attached to the classifier model and both detector and classifier work in tandem to perform adversarial detection that requires a high computational overhead which is not available at the low-power edge. Therefore, prior works can only perform adversarial detection at the cloud and not at the edge. This means that in case of adversarial attacks, the unfavourable adversarial samples must be communicated to the cloud which leads to energy wastage at the edge device. Therefore, a low-power edge-friendly adversarial detection method is required to improve the energy efficiency of the edge and robustness of the cloud-based classifier. To this end, RobustEdge proposes Quantization-enabled Energy Separation (QES) training with "early detection and exit" to perform edge-based low cost adversarial detection. The QES-trained detector implemented at the edge blocks adversarial data transmission to the classifier model, thereby improving adversarial robustness and energy-efficiency of the Cloud-Edge system.

LGFeb 15, 2023
XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars

Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Compute In-Memory platforms such as memristive crossbars are gaining focus as they facilitate acceleration of Deep Neural Networks (DNNs) with high area and compute-efficiencies. However, the intrinsic non-idealities associated with the analog nature of computing in crossbars limits the performance of the deployed DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks leading to severe security threats in their large-scale deployment. Thus, finding adversarially robust DNN architectures for non-ideal crossbars is critical to the safe and secure deployment of DNNs on the edge. This work proposes a two-phase algorithm-hardware co-optimization approach called XploreNAS that searches for hardware-efficient & adversarially robust neural architectures for non-ideal crossbar platforms. We use the one-shot Neural Architecture Search (NAS) approach to train a large Supernet with crossbar-awareness and sample adversarially robust Subnets therefrom, maintaining competitive hardware-efficiency. Our experiments on crossbars with benchmark datasets (SVHN, CIFAR10 & CIFAR100) show upto ~8-16% improvement in the adversarial robustness of the searched Subnets against a baseline ResNet-18 model subjected to crossbar-aware adversarial training. We benchmark our robust Subnets for Energy-Delay-Area-Products (EDAPs) using the Neurosim tool and find that with additional hardware-efficiency driven optimizations, the Subnets attain ~1.5-1.6x lower EDAPs than ResNet-18 baseline.

CVFeb 9, 2022Code
Adversarial Detection without Model Information

Abhishek Moitra, Youngeun Kim, Priyadarshini Panda

Prior state-of-the-art adversarial detection works are classifier model dependent, i.e., they require classifier model outputs and parameters for training the detector or during adversarial detection. This makes their detection approach classifier model specific. Furthermore, classifier model outputs and parameters might not always be accessible. To this end, we propose a classifier model independent adversarial detection method using a simple energy function to distinguish between adversarial and natural inputs. We train a standalone detector independent of the classifier model, with a layer-wise energy separation (LES) training to increase the separation between natural and adversarial energies. With this, we perform energy distribution-based adversarial detection. Our method achieves comparable performance with state-of-the-art detection works (ROC-AUC > 0.9) across a wide range of gradient, score and gaussian noise attacks on CIFAR10, CIFAR100 and TinyImagenet datasets. Furthermore, compared to prior works, our detection approach is light-weight, requires less amount of training data (40% of the actual dataset) and is transferable across different datasets. For reproducibility, we provide layer-wise energy separation training code at https://github.com/Intelligent-Computing-Lab-Yale/Energy-Separation-Training

NEJan 31, 2022Code
Rate Coding or Direct Coding: Which One is Better for Accurate, Robust, and Energy-efficient Spiking Neural Networks?

Youngeun Kim, Hyoungseob Park, Abhishek Moitra et al.

Recent Spiking Neural Networks (SNNs) works focus on an image classification task, therefore various coding techniques have been proposed to convert an image into temporal binary spikes. Among them, rate coding and direct coding are regarded as prospective candidates for building a practical SNN system as they show state-of-the-art performance on large-scale datasets. Despite their usage, there is little attention to comparing these two coding schemes in a fair manner. In this paper, we conduct a comprehensive analysis of the two codings from three perspectives: accuracy, adversarial robustness, and energy-efficiency. First, we compare the performance of two coding techniques with various architectures and datasets. Then, we measure the robustness of the coding techniques on two adversarial attack methods. Finally, we compare the energy-efficiency of two coding schemes on a digital hardware platform. Our results show that direct coding can achieve better accuracy especially for a small number of timesteps. In contrast, rate coding shows better robustness to adversarial attacks owing to the non-differentiable spike generation process. Rate coding also yields higher energy-efficiency than direct coding which requires multi-bit precision for the first layer. Our study explores the characteristics of two codings, which is an important design consideration for building SNNs. The code is made available at https://github.com/Intelligent-Computing-Lab-Yale/Rate-vs-Direct.

CRJun 22, 2021Code
DetectX -- Adversarial Input Detection using Current Signatures in Memristive XBar Arrays

Abhishek Moitra, Priyadarshini Panda

Adversarial input detection has emerged as a prominent technique to harden Deep Neural Networks(DNNs) against adversarial attacks. Most prior works use neural network-based detectors or complex statistical analysis for adversarial detection. These approaches are computationally intensive and vulnerable to adversarial attacks. To this end, we propose DetectX - a hardware friendly adversarial detection mechanism using hardware signatures like Sum of column Currents (SoI) in memristive crossbars (XBar). We show that adversarial inputs have higher SoI compared to clean inputs. However, the difference is too small for reliable adversarial detection. Hence, we propose a dual-phase training methodology: Phase1 training is geared towards increasing the separation between clean and adversarial SoIs; Phase2 training improves the overall robustness against different strengths of adversarial attacks. For hardware-based adversarial detection, we implement the DetectX module using 32nm CMOS circuits and integrate it with a Neurosim-like analog crossbar architecture. We perform hardware evaluation of the Neurosim+DetectX system on the Neurosim platform using datasets-CIFAR10(VGG8), CIFAR100(VGG16) and TinyImagenet(ResNet18). Our experiments show that DetectX is 10x-25x more energy efficient and immune to dynamic adversarial attacks compared to previous state-of-the-art works. Moreover, we achieve high detection performance (ROC-AUC > 0.95) for strong white-box and black-box attacks. The code has been released at https://github.com/Intelligent-Computing-Lab-Yale/DetectX

LGFeb 4, 2024
ClipFormer: Key-Value Clipping of Transformers on Memristive Crossbars for Write Noise Mitigation

Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

Transformers have revolutionized various real-world applications from natural language processing to computer vision. However, traditional von-Neumann computing paradigm faces memory and bandwidth limitations in accelerating transformers owing to their massive model sizes. To this end, In-memory Computing (IMC) crossbars based on Non-volatile Memories (NVMs), due to their ability to perform highly parallelized Matrix-Vector-Multiplications (MVMs) with high energy-efficiencies, have emerged as a promising solution for accelerating transformers. However, analog MVM operations in crossbars introduce non-idealities, such as stochastic read & write noise, which affect the inference accuracy of the deployed transformers. Specifically, we find pre-trained Vision Transformers (ViTs) to be vulnerable on crossbars due to the impact of write noise on the dynamically-generated Key (K) and Value (V) matrices in the attention layers, an effect not accounted for in prior studies. We, thus, propose ClipFormer, a transformation on the K and V matrices during inference, to boost the non-ideal accuracies of pre-trained ViT models. ClipFormer requires no additional hardware and training overhead and is amenable to transformers deployed on any memristive crossbar platform. Our experiments on Imagenet-1k dataset using pre-trained DeiT-S transformers, subjected to standard training and variation-aware-training, show >10-40% higher non-ideal accuracies at the high write noise regime by applying ClipFormer.

ARFeb 14, 2025
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs

Abhishek Moitra, Arkapravo Ghosh, Shrey Agarwal et al.

The computational and memory challenges of large language models (LLMs) have sparked several optimization approaches towards their efficient implementation. While prior LLM-targeted quantization, and prior works on sparse acceleration have significantly mitigated the memory and computation bottleneck, they do so assuming high power platforms such as GPUs and server-class FPGAs with large off-chip memory bandwidths and employ a generalized matrix multiplication (GEMM) execution of all the layers in the decoder. In such a GEMM-based execution, data is fetched from an off-chip memory, computed and stored back. However, at reduced off-chip memory capacities, as is the case with low-power edge devices, this implementation strategy significantly increases the attention computation latency owing to the repeated storage and fetch of large intermediate tokens to and from the off-chip memory. Moreover, fetching the weight matrices from a bandwidth constrained memory further aggravates the memory bottleneck problem. To this end, we introduce MEADOW, a framework that significantly reduces the off-chip memory access for LLMs with a novel token-parallel head-sequential (TPHS) dataflow. Additionally, MEADOW applies weight packing that performs loss-less decomposition of large weight matrices to their unique elements thereby, reducing the enormous weight fetch latency. MEADOW demonstrates 1.5x and 2.5x lower decode and prefill latency, respectively, compared to a GEMM-based LLM implementation on the low power Xilinx ZCU102 FPGA platform that consumes less than 10W. Additionally, MEADOW achieves an end-to-end latency improvement of over 40%, compared to prior LLM optimization works.

LGMay 28, 2023
Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing

Abhiroop Bhattacharjee, Abhishek Moitra, Youngeun Kim et al.

In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. In addition to quantization errors, most frequently encountered non-idealities during inference include crossbar circuit-level parasitic resistances and device-level non-idealities such as stochastic read noise and temporal drift. In this work, our goal is to closely examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars and explore the feasibility of a nearly training-less solution via crossbar-aware fine-tuning of batchnorm parameters in real-time to mitigate the impact of the non-idealities. This enables reduction in hardware costs in terms of memory and training energy for IMC noise-aware retraining of the DNN weights on crossbars.

NEMay 27, 2023
Input-Aware Dynamic Timestep Spiking Neural Networks for Efficient In-Memory Computing

Yuhang Li, Abhishek Moitra, Tamar Geller et al.

Spiking Neural Networks (SNNs) have recently attracted widespread research interest as an efficient alternative to traditional Artificial Neural Networks (ANNs) because of their capability to process sparse and binary spike information and avoid expensive multiplication operations. Although the efficiency of SNNs can be realized on the In-Memory Computing (IMC) architecture, we show that the energy cost and latency of SNNs scale linearly with the number of timesteps used on IMC hardware. Therefore, in order to maximize the efficiency of SNNs, we propose input-aware Dynamic Timestep SNN (DT-SNN), a novel algorithmic solution to dynamically determine the number of timesteps during inference on an input-dependent basis. By calculating the entropy of the accumulated output after each timestep, we can compare it to a predefined threshold and decide if the information processed at the current timestep is sufficient for a confident prediction. We deploy DT-SNN on an IMC architecture and show that it incurs negligible computational overhead. We demonstrate that our method only uses 1.46 average timesteps to achieve the accuracy of a 4-timestep static SNN while reducing the energy-delay-product by 80%.

CVMay 26, 2023
Do We Really Need a Large Number of Visual Prompts?

Youngeun Kim, Yuhang Li, Abhishek Moitra et al.

Due to increasing interest in adapting models on resource-constrained edges, parameter-efficient transfer learning has been widely explored. Among various methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input space, shows competitive fine-tuning performance compared to training of full network parameters. However, VPT increases the number of input tokens, resulting in additional computational overhead. In this paper, we analyze the impact of the number of prompts on fine-tuning performance and self-attention operation in a vision transformer architecture. Through theoretical and empirical analysis we show that adding more prompts does not lead to linear performance improvement. Further, we propose a Prompt Condensation (PC) technique that aims to prevent performance degradation from using a small number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show that our approach reduces the number of prompts by ~70% while maintaining accuracy.

LGMay 9, 2021
Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks

Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda

With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of Deep Neural Networks (DNNs) has become imperative. We will focus on how to address adversarial robustness for DNNs through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energy-spending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this paper, we show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaledVdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization.

LGJan 12, 2021
Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee et al.

As neural networks gain widespread adoption in embedded devices, there is a need for model compression techniques to facilitate deployment in resource-constrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most approaches take a fully trained model, apply different heuristics to determine the optimal bit-precision for different layers of the network, and retrain the network to regain any drop in accuracy. Based on Activation Density (AD)-the proportion of non-zero activations in a layer-we propose an in-training quantization method. Our method calculates bit-width for each layer during training yielding a mixed precision model with competitive accuracy. Since we train lower precision models during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for re-training. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures and report the accuracy and energy estimates for the same. We achieve ~4.5x benefit in terms of estimated multiply-and-accumulate (MAC) reduction while reducing the training complexity by 50% in our experiments. To further evaluate the energy benefits of our proposed method, we develop a mixed-precision scalable Process In Memory (PIM) hardware accelerator platform. The hardware platform incorporates shift-add functionality for handling multi-bit precision neural network models. Evaluating the quantized models obtained with our proposed method on the PIM platform yields ~5x energy reduction compared to 16-bit models. Additionally, we find that integrating AD based quantization with AD based pruning (both conducted during training) yields up to ~198x and ~44x energy reductions for VGG19 and ResNet18 architectures respectively on PIM platform compared to baseline 16-bit precision, unpruned models.

CVJan 5, 2021
Noise Sensitivity-Based Energy Efficient and Robust Adversary Detection in Neural Networks

Rachel Sterneck, Abhishek Moitra, Priyadarshini Panda

Neural networks have achieved remarkable performance in computer vision, however they are vulnerable to adversarial examples. Adversarial examples are inputs that have been carefully perturbed to fool classifier networks, while appearing unchanged to humans. Based on prior works on detecting adversaries, we propose a structured methodology of augmenting a deep neural network (DNN) with a detector subnetwork. We use $\textit{Adversarial Noise Sensitivity}$ (ANS), a novel metric for measuring the adversarial gradient contribution of different intermediate layers of a network. Based on the ANS value, we append a detector to the most sensitive layer. In prior works, more complex detectors were added to a DNN, increasing the inference computational cost of the model. In contrast, our structured and strategic addition of a detector to a DNN reduces the complexity of the model while making the overall network adversarially resilient. Through comprehensive white-box and black-box experiments on MNIST, CIFAR-10, and CIFAR-100, we show that our method improves state-of-the-art detector robustness against adversarial examples. Furthermore, we validate the energy efficiency of our proposed adversarial detection methodology through an extensive energy analysis on various hardware scalable CMOS accelerator platforms. We also demonstrate the effects of quantization on our detector-appended networks.

CRNov 26, 2020
Exposing the Robustness and Vulnerability of Hybrid 8T-6T SRAM Memory Architectures to Adversarial Attacks in Deep Neural Networks

Abhishek Moitra, Priyadarshini Panda

Deep Learning is able to solve a plethora of once impossible problems. However, they are vulnerable to input adversarial attacks preventing them from being autonomously deployed in critical applications. Several algorithm-centered works have discussed methods to cause adversarial attacks and improve adversarial robustness of a Deep Neural Network (DNN). In this work, we elicit the advantages and vulnerabilities of hybrid 6T-8T memories to improve the adversarial robustness and cause adversarial attacks on DNNs. We show that bit-error noise in hybrid memories due to erroneous 6T-SRAM cells have deterministic behaviour based on the hybrid memory configurations (V_DD, 8T-6T ratio). This controlled noise (surgical noise) can be strategically introduced into specific DNN layers to improve the adversarial accuracy of DNNs. At the same time, surgical noise can be carefully injected into the DNN parameters stored in hybrid memory to cause adversarial attacks. To improve the adversarial robustness of DNNs using surgical noise, we propose a methodology to select appropriate DNN layers and their corresponding hybrid memory configurations to introduce the required surgical noise. Using this, we achieve 2-8% higher adversarial accuracy without re-training against white-box attacks like FGSM, than the baseline models (with no surgical noise introduced). To demonstrate adversarial attacks using surgical noise, we design a novel, white-box attack on DNN parameters stored in hybrid memory banks that causes the DNN inference accuracy to drop by more than 60% with over 90% confidence value. We support our claims with experiments, performed using benchmark datasets-CIFAR10 and CIFAR100 on VGG19 and ResNet18 networks.