Kanad Basu

CR
h-index23
18papers
335citations
Novelty51%
AI Score54

18 Papers

CYMar 18Code
GUIDE: GenAI Units In Digital Design Education

Weihua Xiao, Jason Blocklove, Matthew DeLorenzo et al. · stanford

GenAI Units In Digital Design Education (GUIDE) is an open courseware repository with runnable Google Colab labs and other materials. We describe the repository's architecture and educational approach based on standardized teaching units comprising slides, short videos, runnable labs, and related papers. This organization enables consistency for both the students' learning experience and the reuse and grading by instructors. We demonstrate GUIDE in practice with three representative units: VeriThoughts for reasoning and formal-verification-backed RTL generation, enhanced LLM-aided testbench generation, and LLMPirate for IP Piracy. We also provide details for four example course instances (GUIDE4ChipDesign, Build your ASIC, GUIDE4HardwareSecurity, and Hardware Design) that assemble GUIDE units into full semester offerings, learning outcomes, and capstone projects, all based on proven materials. For example, the GUIDE4HardwareSecurity course includes a project on LLM-aided hardware Trojan insertion that has been successfully deployed in the classroom and in Cybersecurity Games and Conference (CSAW), a student competition and academic conference for cybersecurity. We also organized an NYU Cognichip Hackathon, engaging students across 24 international teams in AI-assisted RTL design workflows. The GUIDE repository is open for contributions and available at: https://github.com/FCHXWH823/LLM4ChipDesign.

ETMar 26
EPAR: Electromagnetic Pathways to Architectural Reliability in Quantum Processors

Navnil Choudhury, Yizhuo Tan, Jiaqi Yu et al.

As superconducting processors scale, understanding how physical layout shapes qubit interactions is essential for architectural reliability. Existing methods offer limited insight into how electromagnetic design choices translate into execution-level behavior. We present EPAR, an electromagnetic-to-architecture framework that predicts robustness early directly from physical design by reconstructing how design distortion modifies the effective Hamiltonian, reroutes mediated connectivity, and influences control-pulse response. Across all tested layouts, EPAR's structural scores show 100% agreement with two-qubit error trends yet reveal over 10X robustness differences among edges with identical calibrated error rates, going beyond conventional metrics to provide improved and actionable compiler guidance.

LGMar 11, 2025Code
Enhancing Large Language Models for Hardware Verification: A Novel SystemVerilog Assertion Dataset

Anand Menon, Samit S Miftah, Shamik Kundu et al.

Hardware verification is crucial in modern SoC design, consuming around 70% of development time. SystemVerilog assertions ensure correct functionality. However, existing industrial practices rely on manual efforts for assertion generation, which becomes increasingly untenable as hardware systems become complex. Recent research shows that Large Language Models (LLMs) can automate this process. However, proprietary SOTA models like GPT-4o often generate inaccurate assertions and require expensive licenses, while smaller open-source LLMs need fine-tuning to manage HDL code complexities. To address these issues, we introduce **VERT**, an open-source dataset designed to enhance SystemVerilog assertion generation using LLMs. VERT enables researchers in academia and industry to fine-tune open-source models, outperforming larger proprietary ones in both accuracy and efficiency while ensuring data privacy through local fine-tuning and eliminating costly licenses. The dataset is curated by systematically augmenting variables from open-source HDL repositories to generate synthetic code snippets paired with corresponding assertions. Experimental results demonstrate that fine-tuned models like Deepseek Coder 6.7B and Llama 3.1 8B outperform GPT-4o, achieving up to 96.88% improvement over base models and 24.14% over GPT-4o on platforms including OpenTitan, CVA6, OpenPiton and Pulpissimo. VERT is available at https://github.com/AnandMenon12/VERT.

LGNov 3, 2025
HyperNQ: A Hypergraph Neural Network Decoder for Quantum LDPC Codes

Ameya S. Bhave, Navnil Choudhury, Kanad Basu

Quantum computing requires effective error correction strategies to mitigate noise and decoherence. Quantum Low-Density Parity-Check (QLDPC) codes have emerged as a promising solution for scalable Quantum Error Correction (QEC) applications by supporting constant-rate encoding and a sparse parity-check structure. However, decoding QLDPC codes via traditional approaches such as Belief Propagation (BP) suffers from poor convergence in the presence of short cycles. Machine learning techniques like Graph Neural Networks (GNNs) utilize learned message passing over their node features; however, they are restricted to pairwise interactions on Tanner graphs, which limits their ability to capture higher-order correlations. In this work, we propose HyperNQ, the first Hypergraph Neural Network (HGNN)- based QLDPC decoder that captures higher-order stabilizer constraints by utilizing hyperedges-thus enabling highly expressive and compact decoding. We use a two-stage message passing scheme and evaluate the decoder over the pseudo-threshold region. Below the pseudo-threshold mark, HyperNQ improves the Logical Error Rate (LER) up to 84% over BP and 50% over GNN-based strategies, demonstrating enhanced performance over the existing state-of-the-art decoders.

QUANT-PHApr 20, 2024
PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud

Zhepeng Wang, Yi Sheng, Nirajan Koirala et al.

Benefiting from cloud computing, today's early-stage quantum computers can be remotely accessed via the cloud services, known as Quantum-as-a-Service (QaaS). However, it poses a high risk of data leakage in quantum machine learning (QML). To run a QML model with QaaS, users need to locally compile their quantum circuits including the subcircuit of data encoding first and then send the compiled circuit to the QaaS provider for execution. If the QaaS provider is untrustworthy, the subcircuit to encode the raw data can be easily stolen. Therefore, we propose a co-design framework for preserving the data security of QML with the QaaS paradigm, namely PristiQ. By introducing an encryption subcircuit with extra secure qubits associated with a user-defined security key, the security of data can be greatly enhanced. And an automatic search algorithm is proposed to optimize the model to maintain its performance on the encrypted quantum data. Experimental results on simulation and the actual IBM quantum computer both prove the ability of PristiQ to provide high security for the quantum data while maintaining the model performance in QML.

CRNov 21, 2024
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs

Sanjay Das, Swastik Bhattacharya, Souvik Kundu et al.

Large Language Models (LLMs) have revolutionized natural language processing (NLP), excelling in tasks like text generation and summarization. However, their increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs). BFAs, enabled by fault injection methods such as Rowhammer, target model parameters in memory, compromising both integrity and performance. Identifying critical parameters for BFAs in the vast parameter space of LLMs poses significant challenges. While prior research suggests transformer-based architectures are inherently more robust to BFAs compared to traditional deep neural networks, we challenge this assumption. For the first time, we demonstrate that as few as three bit-flips can cause catastrophic performance degradation in an LLM with billions of parameters. Current BFA techniques are inadequate for exploiting this vulnerability due to the difficulty of efficiently identifying critical parameters within the immense parameter space. To address this, we propose AttentionBreaker, a novel framework tailored for LLMs that enables efficient traversal of the parameter space to identify critical parameters. Additionally, we introduce GenBFA, an evolutionary optimization strategy designed to refine the search further, isolating the most critical bits for an efficient and effective attack. Empirical results reveal the profound vulnerability of LLMs to AttentionBreaker. For example, merely three bit-flips (4.129 x 10^-9% of total parameters) in the LLaMA3-8B-Instruct 8-bit quantized (W8) model result in a complete performance collapse: accuracy on MMLU tasks drops from 67.3% to 0%, and Wikitext perplexity skyrockets from 12.6 to 4.72 x 10^5. These findings underscore the effectiveness of AttentionBreaker in uncovering and exploiting critical vulnerabilities within LLM architectures.

CRDec 14, 2025
COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Sanjay Das, Swastik Bhattacharya, Shamik Kundu et al.

State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering linear-time scalability together with strong performance in long-context settings. Owing to their unique combination of efficiency, scalability, and expressive capacity, SSMs have become compelling alternatives to transformer-based models, which suffer from the quadratic computational and memory costs of attention mechanisms. As SSMs are increasingly deployed in real-world applications, it is critical to assess their susceptibility to both software- and hardware-level threats to ensure secure and reliable operation. Among such threats, hardware-induced bit-flip attacks (BFAs) pose a particularly severe risk by corrupting model parameters through memory faults, thereby undermining model accuracy and functional integrity. To investigate this vulnerability, we introduce RAMBO, the first BFA framework specifically designed to target Mamba-based architectures. Through experiments on the Mamba-1.4b model with LAMBADA benchmark, a cloze-style word-prediction task, we demonstrate that flipping merely a single critical bit can catastrophically reduce accuracy from 74.64% to 0% and increase perplexity from 18.94 to 3.75 x 10^6. These results demonstrate the pronounced fragility of SSMs to adversarial perturbations.

ARNov 23, 2025
SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Swastik Bhattacharya, Sanjay Das, Anand Menon et al.

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these parameters efficiently in traditional accelerators is limited by data-transmission bottlenecks, motivating Compute-in-Memory (CiM) architectures that integrate computation within or near memory to reduce data movement. Recent work has explored CiM designs using Floating-Point (FP) and Integer (INT) operations. FP computations typically deliver higher output quality due to their wider dynamic range and precision, benefiting precision-sensitive Generative AI applications. These include models such as LLMs, thus driving advancements in FP-CiM accelerators. However, the vulnerability of FP-CiM to hardware faults remains underexplored, posing a major reliability concern in mission-critical settings. To address this gap, we systematically analyze hardware fault effects in FP-CiM by introducing bit-flip faults at key computational stages, including digital multipliers, CiM memory cells, and digital adder trees. Experiments with Convolutional Neural Networks (CNNs) such as AlexNet and state-of-the-art LLMs including LLaMA-3.2-1B and Qwen-0.3B-Base reveal how faults at each stage affect inference accuracy. Notably, a single adder fault can reduce LLM accuracy to 0%. Based on these insights, we propose a fault-resilient design, SafeCiM, that mitigates fault impact far better than a naive FP-CiM with a pre-alignment stage. For example, with 4096 MAC units, SafeCiM reduces accuracy degradation by up to 49x for a single adder fault compared to the baseline FP-CiM architecture.

LGApr 2, 2024
Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning

Ayush Arunachalam, Ian Kintz, Suvadeep Banerjee et al.

Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.

MTRL-SCIJan 1, 2022
Machine Learning-enhanced Efficient Spectroscopic Ellipsometry Modeling

Ayush Arunachalam, S. Novia Berriel, Parag Banerjee et al.

Over the recent years, there has been an extensive adoption of Machine Learning (ML) in a plethora of real-world applications, ranging from computer vision to data mining and drug discovery. In this paper, we utilize ML to facilitate efficient film fabrication, specifically Atomic Layer Deposition (ALD). In order to make advances in ALD process development, which is utilized to generate thin films, and its subsequent accelerated adoption in industry, it is imperative to understand the underlying atomistic processes. Towards this end, in situ techniques for monitoring film growth, such as Spectroscopic Ellipsometry (SE), have been proposed. However, in situ SE is associated with complex hardware and, hence, is resource intensive. To address these challenges, we propose an ML-based approach to expedite film thickness estimation. The proposed approach has tremendous implications of faster data acquisition, reduced hardware complexity and easier integration of spectroscopic ellipsometry for in situ monitoring of film thickness deposition. Our experimental results involving SE of TiO2 demonstrate that the proposed ML-based approach furnishes promising thickness prediction accuracy results of 88.76% within +/-1.5 nm and 85.14% within +/-0.5 nm intervals. Furthermore, we furnish accuracy results up to 98% at lower thicknesses, which is a significant improvement over existing SE-based analysis, thereby making our solution a viable option for thickness estimation of ultrathin films.

LGJan 8, 2021
Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators

Ayesha Siddique, Kanad Basu, Khaza Anuarul Hoque

Systolic array-based deep neural network (DNN) accelerators have recently gained prominence for their low computational cost. However, their high energy consumption poses a bottleneck to their deployment in energy-constrained devices. To address this problem, approximate computing can be employed at the cost of some tolerable accuracy loss. However, such small accuracy variations may increase the sensitivity of DNNs towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate DNNs has been thoroughly investigated in the literature. Conversely, the impact of permanent faults in approximate DNN accelerators (AxDNNs) is yet under-explored. The impact of such faults may vary with the fault bit positions, activation functions and approximation errors in AxDNN layers. Such dynamacity poses a considerable challenge to exploring the trade-off between their energy efficiency and fault resilience in AxDNNs. Towards this, we present an extensive layer-wise and bit-wise fault resilience and energy analysis of different AxDNNs, using the state-of-the-art Evoapprox8b signed multipliers. In particular, we vary the stuck-at-0, stuck-at-1 fault-bit positions, and activation functions to study their impact using the most widely used MNIST and Fashion-MNIST datasets. Our quantitative analysis shows that the permanent faults exacerbate the accuracy loss in AxDNNs when compared to the accurate DNN accelerators. For instance, a permanent fault in AxDNNs can lead up to 66\% accuracy loss, whereas the same faulty bit can lead to only 9\% accuracy loss in an accurate DNN accelerator. Our results demonstrate that the fault resilience in AxDNNs is orthogonal to the energy efficiency.

CROct 25, 2020
Security Assessment of Interposer-based Chiplet Integration

Mohammed Shayan, Kanad Basu, Ramesh Karri

With transistor scaling reaching its limits, interposer-based integration of dies (chiplets) is gaining traction. Such an interposer-based integration enables finer and tighter interconnect pitch than traditional system-on-packages and offers two key benefits: 1. It reduces design-to-market time by bypassing the time-consuming process of verification and fabrication. 2. It reduces the design cost by reusing chiplets. While black-boxing of the slow design stages cuts down the design time, it raises significant security concerns. We study the security implications of the emerging interposer-based integration methodology. The black-boxed design stages deploy security measures against hardware Trojans, reverse engineering, and intellectual property piracy in traditional systems-on-chip (SoC) designs and hence are not suitable for interposer-based integration. We propose using functionally diverse chiplets to detect and thwart hardware Trojans and use the inherent logic redundancy to shore up anti-piracy measures. Our proposals do not rely on access to the black-box design stages. We evaluate the security, time and cost benefits of our plan by implementing a MIPS processor, a DCT core, and an AES core using various IPs from the Xilinx CORE GENERATOR IP catalog, on an interposer-based Xilinx FPGA.

CRSep 16, 2020
Hardware-Assisted Detection of Firmware Attacks in Inverter-Based Cyberphysical Microgrids

Abraham Peedikayil Kuruvila, Ioannis Zografopoulos, Kanad Basu et al.

The electric grid modernization effort relies on the extensive deployment of microgrid (MG) systems. MGs integrate renewable resources and energy storage systems, allowing to generate economic and zero-carbon footprint electricity, deliver sustainable energy to communities using local energy resources, and enhance grid resilience. MGs as cyberphysical systems include interconnected devices that measure, control, and actuate energy resources and loads. For optimal operation, cyberphysical MGs regulate the onsite energy generation through support functions enabled by smart inverters. Smart inverters, being consumer electronic firmware-based devices, are susceptible to increasing security threats. If inverters are maliciously controlled, they can significantly disrupt MG operation and electricity delivery as well as impact the grid stability. In this paper, we demonstrate the impact of denial-of-service (DoS) as well as controller and setpoint modification attacks on a simulated MG system. Furthermore, we employ custom-built hardware performance counters (HPCs) as design-for-security (DfS) primitives to detect malicious firmware modifications on MG inverters. The proposed HPCs measure periodically the order of various instruction types within the MG inverter's firmware code. Our experiments illustrate that the firmware modifications are successfully identified by our custom-built HPCs utilizing various machine learning-based classifiers.

CRJun 11, 2020
Benchmarking at the Frontier of Hardware Security: Lessons from Logic Locking

Benjamin Tan, Ramesh Karri, Nimisha Limaye et al.

Integrated circuits (ICs) are the foundation of all computing systems. They comprise high-value hardware intellectual property (IP) that are at risk of piracy, reverse-engineering, and modifications while making their way through the geographically-distributed IC supply chain. On the frontier of hardware security are various design-for-trust techniques that claim to protect designs from untrusted entities across the design flow. Logic locking is one technique that promises protection from the gamut of threats in IC manufacturing. In this work, we perform a critical review of logic locking techniques in the literature, and expose several shortcomings. Taking inspiration from other cybersecurity competitions, we devise a community-led benchmarking exercise to address the evaluation deficiencies. In reflecting on this process, we shed new light on deficiencies in evaluation of logic locking and reveal important future directions. The lessons learned can guide future endeavors in other areas of hardware security.

LGJun 5, 2020
High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators

Shamik Kundu, Ahmet Soyyiğit, Khaza Anuarul Hoque et al.

The advent of data-driven real-time applications requires the implementation of Deep Neural Networks (DNNs) on Machine Learning accelerators. Google's Tensor Processing Unit (TPU) is one such neural network accelerator that uses systolic array-based matrix multiplication hardware for computation in its crux. Manufacturing faults at any state element of the matrix multiplication unit can cause unexpected errors in these inference networks. In this paper, we propose a formal model of permanent faults and their propagation in a TPU using the Discrete-Time Markov Chain (DTMC) formalism. The proposed model is analyzed using the probabilistic model checking technique to reason about the likelihood of faulty outputs. The obtained quantitative results show that the classification accuracy is sensitive to the type of permanent faults as well as their location, bit position and the number of layers in the neural network. The conclusions from our theoretical model have been validated using experiments on a digit recognition-based DNN.

CRMay 7, 2020
Defending Hardware-based Malware Detectors against Adversarial Attacks

Abraham Peedikayil Kuruvila, Shamik Kundu, Kanad Basu

In the era of Internet of Things (IoT), Malware has been proliferating exponentially over the past decade. Traditional anti-virus software are ineffective against modern complex Malware. In order to address this challenge, researchers have proposed Hardware-assisted Malware Detection (HMD) using Hardware Performance Counters (HPCs). The HPCs are used to train a set of Machine learning (ML) classifiers, which in turn, are used to distinguish benign programs from Malware. Recently, adversarial attacks have been designed by introducing perturbations in the HPC traces using an adversarial sample predictor to misclassify a program for specific HPCs. These attacks are designed with the basic assumption that the attacker is aware of the HPCs being used to detect Malware. Since modern processors consist of hundreds of HPCs, restricting to only a few of them for Malware detection aids the attacker. In this paper, we propose a Moving target defense (MTD) for this adversarial attack by designing multiple ML classifiers trained on different sets of HPCs. The MTD randomly selects a classifier; thus, confusing the attacker about the HPCs or the number of classifiers applied. We have developed an analytical model which proves that the probability of an attacker to guess the perfect HPC-classifier combination for MTD is extremely low (in the range of $10^{-1864}$ for a system with 20 HPCs). Our experimental results prove that the proposed defense is able to improve the classification accuracy of HPC traces that have been modified through an adversarial sample generator by up to 31.5%, for a near perfect (99.4%) restoration of the original accuracy.

ARApr 6, 2020
Hardware Trojan Detection Using Controlled Circuit Aging

Virinchi Roy Surabhi, Prashanth Krishnamurthy, Hussam Amrouch et al.

This paper reports a novel approach that uses transistor aging in an integrated circuit (IC) to detect hardware Trojans. When a transistor is aged, it results in delays along several paths of the IC. This increase in delay results in timing violations that reveal as timing errors at the output of the IC during its operation. We present experiments using aging-aware standard cell libraries to illustrate the usefulness of the technique in detecting hardware Trojans. Combining IC aging with over-clocking produces a pattern of bit errors at the IC output by the induced timing violations. We use machine learning to learn the bit error distribution at the output of a clean IC. We differentiate the divergence in the pattern of bit errors because of a Trojan in the IC from this baseline distribution. We simulate the golden IC and show robustness to IC-to-IC manufacturing variations. The approach is effective and can detect a Trojan even if we place it far off the critical paths. Results on benchmarks from the Trust-hub show a detection accuracy of $\geq$99%.

LGFeb 11, 2018
Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator

Jeff Zhang, Tianyu Gu, Kanad Basu et al.

Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as $0.006\%$). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to $50\%$, with negligible drop in classification accuracy (as low as $0.1\%$) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.