CRSep 13, 2022
A Neural Network-based SAT-Resilient Obfuscation Towards Enhanced Logic LockingRakibul Hassan, Gaurav Kolhe, Setareh Rafatirad et al.
Logic obfuscation is introduced as a pivotal defense against multiple hardware threats on Integrated Circuits (ICs), including reverse engineering (RE) and intellectual property (IP) theft. The effectiveness of logic obfuscation is challenged by the recently introduced Boolean satisfiability (SAT) attack and its variants. A plethora of countermeasures has also been proposed to thwart the SAT attack. Irrespective of the implemented defense against SAT attacks, large power, performance, and area overheads are indispensable. In contrast, we propose a cognitive solution: a neural network-based unSAT clause translator, SATConda, that incurs a minimal area and power overhead while preserving the original functionality with impenetrable security. SATConda is incubated with an unSAT clause generator that translates the existing conjunctive normal form (CNF) through minimal perturbations such as the inclusion of pair of inverters or buffers or adding a new lightweight unSAT block depending on the provided CNF. For efficient unSAT clause generation, SATConda is equipped with a multi-layer neural network that first learns the dependencies of features (literals and clauses), followed by a long-short-term-memory (LSTM) network to validate and backpropagate the SAT-hardness for better learning and translation. Our proposed SATConda is evaluated on ISCAS85 and ISCAS89 benchmarks and is seen to defend against multiple state-of-the-art successfully SAT attacks devised for hardware RE. In addition, we also evaluate our proposed SATCondas empirical performance against MiniSAT, Lingeling and Glucose SAT solvers that form the base for numerous existing deobfuscation SAT attacks.
CVOct 1, 2023
SMOOT: Saliency Guided Mask Optimized Online TrainingAli Karkehabadi, Houman Homayoun, Avesta Sasan
Deep Neural Networks are powerful tools for understanding complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. Saliency-Guided Training (SGT) methods try to highlight the prominent features in the model's training based on the output to alleviate this problem. These methods use back-propagation and modified gradients to guide the model toward the most relevant features while keeping the impact on the prediction accuracy negligible. SGT makes the model's final result more interpretable by masking input partially. In this way, considering the model's output, we can infer how each segment of the input affects the output. In the particular case of image as the input, masking is applied to the input pixels. However, the masking strategy and number of pixels which we mask, are considered as a hyperparameter. Appropriate setting of masking strategy can directly affect the model's training. In this paper, we focus on this issue and present our contribution. We propose a novel method to determine the optimal number of masked images based on input, accuracy, and model loss during the training. The strategy prevents information loss which leads to better accuracy values. Also, by integrating the model's performance in the strategy formula, we show that our model represents the salient features more meaningful. Our experimental results demonstrate a substantial improvement in both model accuracy and the prominence of saliency, thereby affirming the effectiveness of our proposed solution.
LGApr 7, 2022
Adaptive-Gravity: A Defense Against Adversarial SamplesAli Mirzaeian, Zhi Tian, Sai Manoj P D et al.
This paper presents a novel model training solution, denoted as Adaptive-Gravity, for enhancing the robustness of deep neural network classifiers against adversarial examples. We conceptualize the model parameters/features associated with each class as a mass characterized by its centroid location and the spread (standard deviation of the distance) of features around the centroid. We use the centroid associated with each cluster to derive an anti-gravity force that pushes the centroids of different classes away from one another during network training. Then we customized an objective function that aims to concentrate each class's features toward their corresponding new centroid, which has been obtained by anti-gravity force. This methodology results in a larger separation between different masses and reduces the spread of features around each centroid. As a result, the samples are pushed away from the space that adversarial examples could be mapped to, effectively increasing the degree of perturbation needed for making an adversarial example. We have implemented this training solution as an iterative method consisting of four steps at each iteration: 1) centroid extraction, 2) anti-gravity force calculation, 3) centroid relocation, and 4) gravity training. Gravity's efficiency is evaluated by measuring the corresponding fooling rates against various attack models, including FGSM, MIM, BIM, and PGD using LeNet and ResNet110 networks, benchmarked against MNIST and CIFAR10 classification problems. Test results show that Gravity not only functions as a powerful instrument to robustify a model against state-of-the-art adversarial attacks but also effectively improves the model training accuracy.
CRApr 4, 2023
Side Channel-Assisted Inference Leakage from Machine Learning-based ECG ClassificationJialin Liu, Ning Miao, Chongzhou Fang et al.
The Electrocardiogram (ECG) measures the electrical cardiac activity generated by the heart to detect abnormal heartbeat and heart attack. However, the irregular occurrence of the abnormalities demands continuous monitoring of heartbeats. Machine learning techniques are leveraged to automate the task to reduce labor work needed during monitoring. In recent years, many companies have launched products with ECG monitoring and irregular heartbeat alert. Among all classification algorithms, the time series-based algorithm dynamic time warping (DTW) is widely adopted to undertake the ECG classification task. Though progress has been achieved, the DTW-based ECG classification also brings a new attacking vector of leaking the patients' diagnosis results. This paper shows that the ECG input samples' labels can be stolen via a side-channel attack, Flush+Reload. In particular, we first identify the vulnerability of DTW for ECG classification, i.e., the correlation between warping path choice and prediction results. Then we implement an attack that leverages Flush+Reload to monitor the warping path selection with known ECG data and then build a predictor for constructing the relation between warping path selection and labels of input ECG samples. Based on experiments, we find that the Flush+Reload-based inference leakage can achieve an 84.0\% attacking success rate to identify the labels of the two samples in DTW.
84.9LGMay 25
CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent FailuresAkash Bonagiri, Devang Borkar, Gerard Janno Anderias et al.
Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down. We introduce CausalFlow, an interventional framework that converts failed agent traces into minimal counterfactual repairs and reusable supervision. CausalFlow models execution traces as sequential chains of dependent steps and computes Causal Responsibility Scores(CRS) via step-level counterfactual intervention to identify failure-inducing steps. For these steps, we generate minimally edited repairs that flip the final outcome to success, producing validated contrastive pairs of the form (wrong step, corrected step). CausalFlow supports two complementary uses: targeted test-time repair that recovers from failures with minimal behavioral drift, and training-time supervision suitable for offline preference optimization or reward modeling. Across four benchmarks spanning mathematical reasoning, code generation, question answering, and medical browsing, CausalFlow converts failed executions into validated minimal repairs with high minimality and causal-consensus scores, and demonstrates that causal attribution is necessary for reliable improvement across diverse agent tasks, outperforming heuristic refinement in complex retrieval settings while producing more localized repairs throughout. These results demonstrate that interventional analysis over structured execution traces provides a principled and scalable mechanism for transforming agent failures into reliability gains and learning-ready supervision.
32.2CRMar 20
Kumo: A Security-Focused Serverless Cloud SimulatorWei Shao, Khaled Khasawneh, Setareh Rafatirad et al.
Serverless computing abstracts infrastructure management but also obscures system-level behaviors that can introduce security risks. Prior work has shown that serverless platforms are vulnerable to attacks exploiting shared execution environments, including attacker--victim co-location and denial-of-service through resource contention, yet analyzing these risks on production platforms is difficult due to limited observability, high cost, and lack of experimental control, while existing simulators primarily focus on performance and cost rather than security. We present Kumo, a security-focused simulator for serverless platforms that enables controlled, reproducible analysis of security risks arising from scheduling and resource sharing decisions. Kumo models invocation arrivals, scheduler placement, container reuse, resource contention, and queuing within a discrete-event framework, explicitly representing attackers and victims as first-class entities and providing metrics such as co-location probability, time to first co-location, invocation drop rate, and tail latency. Through two case studies, we show that scheduler choice is a first-order factor for co-location attacks, inducing orders-of-magnitude differences under identical workloads, while Denial-of-Service behavior is largely governed by system-level factors such as service time, queuing policy, and cluster capacity once contention dominates. These results highlight the need to distinguish scheduler-driven isolation risks from broader resource exhaustion vulnerabilities and position Kumo as a flexible foundation for systematic, security-aware exploration of serverless platforms.
5.5CVApr 28
SaliencyDecor: Enhancing Neural Network Interpretability through Feature DecorrelationAli Karkehabadi, Jamshid Hassanpour, Houman Homayoun et al.
Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps. To address this issue, we identify feature correlation as a structural limitation of gradient-based interpretability and propose SaliencyDecor, a training framework that enforces feature decorrelation to improve attribution fidelity without modifying saliency methods or model architectures by reshaping the feature space toward orthogonality, our approach promotes more concentrated gradient flow and improves the fidelity of saliency-based explanations. SaliencyDecor jointly optimizes classification, prediction consistency under feature masking, and a decorrelation regularizer, requiring no architectural changes or inference-time overhead. Extensive experiments across multiple benchmarks and architectures demonstrate that our method produces substantially sharper and more object-focused saliency maps while simultaneously improving predictive performance, achieving accuracy gains across the datasets. These results establish our method as a principled mechanism for enhancing both interpretability and accuracy, challenging the conventional trade-off between explanation quality and model performance.
77.7ETMay 15
Lightweight Cross-Device Sleep Tracking on the WeBe Wearable PlatformWei Shao, Ehsan Kourkchi, Krishi Prashant Shah et al.
Wearable devices are widely used for continuous health monitoring, yet reliable sleep tracking on emerging platforms remains underexplored due to reliance on proprietary algorithms and device-specific activity representations. We present a lightweight and reproducible sleep tracking pipeline that operates directly on raw accelerometer signals. The method converts data into epoch-level activity features, applies temporal smoothing and normalized scoring, and performs sleep/wake classification using a globally calibrated threshold. We calibrate the model on the Multilevel Monitoring of Activity and Sleep in Healthy People (MMASH) dataset and evaluate it in a cross-device study using the WeBe wearable platform and a commercial ActiGraph device. On MMASH, the method achieves a mean absolute error of 41.6 minutes in Total Sleep Time (TST), with onset and offset errors of 6.3 and 7.4 minutes. On real-world WeBe data from three participants across five sessions, it achieves a mean TST error of 27.4 minutes and onset and offset errors of 13.9 and 8.0 minutes. In contrast, a commercial ActiGraph pipeline shows larger discrepancies relative to ground truth. These results demonstrate accurate and generalizable sleep tracking using a simple and reproducible pipeline.
68.4LGMay 4
STABLEVAL: Disagreement-Aware and Stable Evaluation of AI SystemsAkash Bonagiri, Gerard Janno Anderias, Saee Patil et al.
Human evaluation remains the primary standard for assessing modern AI systems, yet annotator disagreement, bias, and variability make system rankings fragile under standard majority vote aggregation. Majority vote discards annotator reliability and item-level ambiguity, often yielding unstable comparisons across annotator subsets. We introduce STABLEVAL, a disagreement-aware evaluation framework that models latent item correctness and annotator-specific confusion patterns to produce posterior expected item credit and calibrated agent-level scores. Unlike label-denoising approaches such as Dawid-Skene, STABLEVAL is explicitly designed for stable and uncertainty-aware system evaluation rather than hard label recovery. We formalize ranking stability as a first-class evaluation objective and analyze how aggregation methods preserve or distort underlying annotator behavior. Across controlled synthetic experiments and multiple real-world human-annotated benchmarks, majority vote exhibits increasing score error and ranking instability under annotator heterogeneity and adversarial noise, while STABLEVAL yields more stable and statistically grounded system rankings. These results demonstrate that modeling disagreement is essential for robust and reproducible AI evaluation.
LGMay 21, 2024
FFCL: Forward-Forward Net with Cortical Loops, Training and Inference on Edge Without BackpropagationAli Karkehabadi, Houman Homayoun, Avesta Sasan
The Forward-Forward Learning (FFL) algorithm is a recently proposed solution for training neural networks without needing memory-intensive backpropagation. During training, labels accompany input data, classifying them as positive or negative inputs. Each layer learns its response to these inputs independently. In this study, we enhance the FFL with the following contributions: 1) We optimize label processing by segregating label and feature forwarding between layers, enhancing learning performance. 2) By revising label integration, we enhance the inference process, reduce computational complexity, and improve performance. 3) We introduce feedback loops akin to cortical loops in the brain, where information cycles through and returns to earlier neurons, enabling layers to combine complex features from previous layers with lower-level features, enhancing learning efficiency.
CRDec 21, 2023
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation SuggestionYu-Zheng Lin, Muntasir Mamun, Muhtasim Alam Chowdhury et al.
The escalating complexity of modern computing frameworks has resulted in a surge in the cybersecurity vulnerabilities reported to the National Vulnerability Database (NVD) by practitioners. Despite the fact that the stature of NVD is one of the most significant databases for the latest insights into vulnerabilities, extracting meaningful trends from such a large amount of unstructured data is still challenging without the application of suitable technological methodologies. Previous efforts have mostly concentrated on software vulnerabilities; however, a holistic strategy incorporates approaches for mitigating vulnerabilities, score prediction, and a knowledge-generating system that may extract relevant insights from the Common Weakness Enumeration (CWE) and Common Vulnerability Exchange (CVE) databases is notably absent. As the number of hardware attacks on Internet of Things (IoT) devices continues to rapidly increase, we present the Hardware Vulnerability to Weakness Mapping (HW-V2W-Map) Framework, which is a Machine Learning (ML) framework focusing on hardware vulnerabilities and IoT security. The architecture that we have proposed incorporates an Ontology-driven Storytelling framework, which automates the process of updating the ontology in order to recognize patterns and evolution of vulnerabilities over time and provides approaches for mitigating the vulnerabilities. The repercussions of vulnerabilities can be mitigated as a result of this, and conversely, future exposures can be predicted and prevented. Furthermore, our proposed framework utilized Generative Pre-trained Transformer (GPT) Large Language Models (LLMs) to provide mitigation suggestions.
CRApr 2, 2024
Generative AI-Based Effective Malware Detection for Embedded Computing SystemsSreenitha Kasarapu, Sanket Shukla, Rakibul Hassan et al.
One of the pivotal security threats for the embedded computing systems is malicious software a.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require a tremendous number of benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation technique that generates multiple mutated samples of the limitedly seen malware by the devices. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware and mitigate the impractical samples. Such developed malware is further incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited exposure. The experimental results demonstrates that the proposed technique achieves an accuracy of 90% in detecting limitedly seen malware, which is approximately 3x more than the accuracy attained by state-of-the-art techniques.
CRAug 19, 2025
Know Me by My Pulse: Toward Practical Continuous Authentication on Wearable Devices via Wrist-Worn PPGWei Shao, Zequan Liang, Ruoyu Zhang et al.
Biometric authentication using physiological signals offers a promising path toward secure and user-friendly access control in wearable devices. While electrocardiogram (ECG) signals have shown high discriminability, their intrusive sensing requirements and discontinuous acquisition limit practicality. Photoplethysmography (PPG), on the other hand, enables continuous, non-intrusive authentication with seamless integration into wrist-worn wearable devices. However, most prior work relies on high-frequency PPG (e.g., 75 - 500 Hz) and complex deep models, which incur significant energy and computational overhead, impeding deployment in power-constrained real-world systems. In this paper, we present the first real-world implementation and evaluation of a continuous authentication system on a smartwatch, We-Be Band, using low-frequency (25 Hz) multi-channel PPG signals. Our method employs a Bi-LSTM with attention mechanism to extract identity-specific features from short (4 s) windows of 4-channel PPG. Through extensive evaluations on both public datasets (PTTPPG) and our We-Be Dataset (26 subjects), we demonstrate strong classification performance with an average test accuracy of 88.11%, macro F1-score of 0.88, False Acceptance Rate (FAR) of 0.48%, False Rejection Rate (FRR) of 11.77%, and Equal Error Rate (EER) of 2.76%. Our 25 Hz system reduces sensor power consumption by 53% compared to 512 Hz and 19% compared to 128 Hz setups without compromising performance. We find that sampling at 25 Hz preserves authentication accuracy, whereas performance drops sharply at 20 Hz while offering only trivial additional power savings, underscoring 25 Hz as the practical lower bound. Additionally, we find that models trained exclusively on resting data fail under motion, while activity-diverse training improves robustness across physiological states.
CROct 28, 2025
FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer ModelsNajmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini et al.
Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models by obfuscating critical parameters through dynamic rewiring of linear layers. However, the application of FaR introduces non-negligible performance and memory overheads, primarily due to the runtime modification of activation pathways and the lack of hardware-level optimization. To overcome these limitations, we propose FaRAccel, a novel hardware accelerator architecture implemented on FPGA, specifically designed to offload and optimize FaR operations. FaRAccel integrates reconfigurable logic for dynamic activation rerouting, and lightweight storage of rewiring configurations, enabling low-latency inference with minimal energy overhead. We evaluate FaRAccel across a suite of Transformer models and demonstrate substantial reductions in FaR inference latency and improvement in energy efficiency, while maintaining the robustness gains of the original FaR methodology. To the best of our knowledge, this is the first hardware-accelerated defense against BFAs in Transformers, effectively bridging the gap between algorithmic resilience and efficient deployment on real-world AI platforms.
CROct 28, 2025
Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical ImagingBanafsheh Saber Latibari, Najmeh Nazari, Hossein Sayadi et al.
Vision Transformers (ViTs) have emerged as powerful architectures in medical image analysis, excelling in tasks such as disease detection, segmentation, and classification. However, their reliance on large, attention-driven models makes them vulnerable to hardware-level attacks. In this paper, we propose a novel threat model referred to as Med-Hammer that combines the Rowhammer hardware fault injection with neural Trojan attacks to compromise the integrity of ViT-based medical imaging systems. Specifically, we demonstrate how malicious bit flips induced via Rowhammer can trigger implanted neural Trojans, leading to targeted misclassification or suppression of critical diagnoses (e.g., tumors or lesions) in medical scans. Through extensive experiments on benchmark medical imaging datasets such as ISIC, Brain Tumor, and MedMNIST, we show that such attacks can remain stealthy while achieving high attack success rates about 82.51% and 92.56% in MobileViT and SwinTransformer, respectively. We further investigate how architectural properties, such as model sparsity, attention weight distribution, and the number of features of the layer, impact attack effectiveness. Our findings highlight a critical and underexplored intersection between hardware-level faults and deep learning security in healthcare applications, underscoring the urgent need for robust defenses spanning both model architectures and underlying hardware platforms.
SPSep 15, 2025
Self-Supervised and Topological Signal-Quality Assessment for Any PPG DeviceWei Shao, Ruoyu Zhang, Zequan Liang et al.
Wearable photoplethysmography (PPG) is embedded in billions of devices, yet its optical waveform is easily corrupted by motion, perfusion loss, and ambient light, jeopardizing downstream cardiometric analytics. Existing signal-quality assessment (SQA) methods rely either on brittle heuristics or on data-hungry supervised models. We introduce the first fully unsupervised SQA pipeline for wrist PPG. Stage 1 trains a contrastive 1-D ResNet-18 on 276 h of raw, unlabeled data from heterogeneous sources (varying in device and sampling frequency), yielding optical-emitter- and motion-invariant embeddings (i.e., the learned representation is stable across differences in LED wavelength, drive intensity, and device optics, as well as wrist motion). Stage 2 converts each 512-D encoder embedding into a 4-D topological signature via persistent homology (PH) and clusters these signatures with HDBSCAN. To produce a binary signal-quality index (SQI), the acceptable PPG signals are represented by the densest cluster while the remaining clusters are assumed to mainly contain poor-quality PPG signals. Without re-tuning, the SQI attains Silhouette, Davies-Bouldin, and Calinski-Harabasz scores of 0.72, 0.34, and 6173, respectively, on a stratified sample of 10,000 windows. In this study, we propose a hybrid self-supervised-learning--topological-data-analysis (SSL--TDA) framework that offers a drop-in, scalable, cross-device quality gate for PPG signals.
MTRL-SCIApr 5, 2025
Machine Learning Reveals Composition Dependent Thermal Stability in Halide PerovskitesAbigail R. Hering, Mansha Dubey, Elahe Hosseini et al.
Halide perovskites exhibit unpredictable properties in response to environmental stressors, due to several composition-dependent degradation mechanisms. In this work, we apply data visualization and machine learning (ML) techniques to reveal unexpected correlations between composition, temperature, and material properties while using high throughput, in situ environmental photoluminescence (PL) experiments. Correlation heatmaps show the strong influence of Cs content on film degradation, and dimensionality reduction visualization methods uncover clear composition-based data clusters. An extreme gradient boosting algorithm (XGBoost) effectively forecasts PL features for ten perovskite films with both composition-agnostic (>85% accuracy) and composition-dependent (>75% accuracy) model approaches, while elucidating the relative feature importance of composition (up to 99%). This model validates a previously unseen anti-correlation between Cs content and material thermal stability. Our ML-based framework can be expanded to any perovskite family, significantly reducing the analysis time currently employed to identify stable options for photovoltaics.
CRSep 4, 2020
NNgSAT: Neural Network guided SAT Attack on Logic Locked Complex StructuresKimia Zamiri Azar, Hadi Mardani Kamali, Houman Homayoun et al.
The globalization of the IC supply chain has raised many security threats, especially when untrusted parties are involved. This has created a demand for a dependable logic obfuscation solution to combat these threats. Amongst a wide range of threats and countermeasures on logic obfuscation in the 2010s decade, the Boolean satisfiability (SAT) attack, or one of its derivatives, could break almost all state-of-the-art logic obfuscation countermeasures. However, in some cases, particularly when the logic locked circuits contain complex structures, such as big multipliers, large routing networks, or big tree structures, the logic locked circuit is hard-to-be-solved for the SAT attack. Usage of these structures for obfuscation may lead a strong defense, as many SAT solvers fail to handle such complexity. However, in this paper, we propose a neural-network-guided SAT attack (NNgSAT), in which we examine the capability and effectiveness of a message-passing neural network (MPNN) for solving these complex structures (SAT-hard instances). In NNgSAT, after being trained as a classifier to predict SAT/UNSAT on a SAT problem (NN serves as a SAT solver), the neural network is used to guide/help the actual SAT solver for finding the SAT assignment(s). By training NN on conjunctive normal forms (CNFs) corresponded to a dataset of logic locked circuits, as well as fine-tuning the confidence rate of the NN prediction, our experiments show that NNgSAT could solve 93.5% of the logic locked circuits containing complex structures within a reasonable time, while the existing SAT attack cannot proceed the attack flow in them.
CRSep 4, 2020
InterLock: An Intercorrelated Logic and Routing LockingHadi Mardani Kamali, Kimia Zamiri Azar, Houman Homayoun et al.
In this paper, we propose a canonical prune-and-SAT (CP&SAT) attack for breaking state-of-the-art routing-based obfuscation techniques. In the CP&SAT attack, we first encode the key-programmable routing blocks (keyRBs) based on an efficient SAT encoding mechanism suited for detailed routing constraints, and then efficiently re-encode and reduce the CNF corresponded to the keyRB using a bounded variable addition (BVA) algorithm. In the CP&SAT attack, this is done before subjecting the circuit to the SAT attack. We illustrate that this encoding and BVA-based pre-processing significantly reduces the size of the CNF corresponded to the routing-based obfuscated circuit, in the result of which we observe 100% success rate for breaking prior art routing-based obfuscation techniques. Further, we propose a new intercorrelated logic and routing locking technique, or in short InterLock, as a countermeasure to mitigate the CP&SAT attack. In Interlock, in addition to hiding the connectivity, a part of the logic (gates) in the selected timing paths are also implemented in the keyRB(s). We illustrate that when the logic gates are twisted with keyRBs, the BVA could not provide any advantage as a pre-processing step. Our experimental results show that, by using InterLock, with only three 8$\times$8 or only two 16x16 keyRBs (twisted with actual logic gates), the resilience against existing attacks as well as our new proposed CP&SAT attack would be guaranteed while, on average, the delay/area overhead is less than 10% for even medium-size benchmark circuits.
LGJun 29, 2020
Conditional Classification: A Solution for Computational Energy ReductionAli Mirzaeian, Sai Manoj, Ashkan Vakil et al.
Deep convolutional neural networks have shown high efficiency in computer visions and other applications. However, with the increase in the depth of the networks, the computational complexity is growing exponentially. In this paper, we propose a novel solution to reduce the computational complexity of convolutional neural network models used for many class image classification. Our proposed technique breaks the classification task into two steps: 1) coarse-grain classification, in which the input samples are classified among a set of hyper-classes, 2) fine-grain classification, in which the final labels are predicted among those hyper-classes detected at the first step. We illustrate that our proposed classifier can reach the level of accuracy reported by the best in class classification models with less computational complexity (Flop Count) by only activating parts of the model that are needed for the image classification.
IVJun 26, 2020
Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial AttacksAli Mirzaeian, Jana Kosecka, Houman Homayoun et al.
This paper proposes an ensemble learning model that is resistant to adversarial attacks. To build resilience, we introduced a training process where each member learns a radically distinct latent space. Member models are added one at a time to the ensemble. Simultaneously, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members. We assessed the security and performance of the proposed solution on image classification tasks using CIFAR10 and MNIST datasets and showed security and performance improvement compared to the state of the art defense methods.
CRMay 24, 2020
SCRAMBLE: The State, Connectivity and Routing Augmentation Model for Building Logic EncryptionHadi Mardani Kamali, Kimia Zamiri Azar, Houman Homayoun et al.
In this paper, we introduce SCRAMBLE, as a novel logic locking solution for sequential circuits while the access to the scan chain is restricted. The SCRAMBLE could be used to lock an FSM by hiding its state transition graph (STG) among a large number of key-controlled false transitions. Also, it could be used to lock sequential circuits (sequential datapath) by hiding the timing paths' connectivity among a large number of key-controlled false connections. Besides, the structure of SCRAMBLE allows us to engage this scheme as a new scan chain locking solution by hiding the correct scan chain sequence among a large number of the key-controlled false sequences. We demonstrate that the proposed scheme resists against both (1) the 2-stage attacks on FSM, and (2) SAT attacks integrated with unrolling as well as bounded-model-checking. We have discussed two variants of SCRAMBLE: (I) Connectivity SCRAMBLE (SCRAMBLE-C), and (b) Logic SCRAMBLE (SCRAMBLE-L). The SCRAMBLE-C relies on the SAT-hard and key-controlled modules that are built using near non-blocking logarithmic switching networks. The SCRAMBLE-L uses input multiplexing techniques to hide a part of the FSM in a memory. In the result section, we describe the effectiveness of each variant against state-of-the-art attacks.
CRMay 8, 2020
On Designing Secure and Robust Scan Chain for Protecting Obfuscated LogicHadi Mardani Kamali, Kimia Zamiri Azar, Houman Homayoun et al.
In this paper, we assess the security and testability of the state-of-the-art design-for-security (DFS) architectures in the presence of scan-chain locking/obfuscation, a group of solution that has previously proposed to restrict unauthorized access to the scan chain. We discuss the key leakage vulnerability in the recently published prior-art DFS architectures. This leakage relies on the potential glitches in the DFS architecture that could lead the adversary to make a leakage condition in the circuit. Also, we demonstrate that the state-of-the-art DFS architectures impose some substantial architectural drawbacks that moderately affect both test flow and design constraints. We propose a new DFS architecture for building a secure scan chain architecture while addressing the potential of key leakage. The proposed architecture allows the designer to perform the structural test with no limitation, enabling an untrusted foundry to utilize the scan chain for manufacturing fault testing without needing to access the scan chain. Our proposed solution poses negligible limitation/overhead on the test flow, as well as the design criteria.
LGMar 22, 2020
Deep Multi-attributed Graph Translation with Node-Edge Co-evolutionXiaojie Guo, Liang Zhao, Cameron Nowzari et al.
Generalized from image and language translation, graph translation aims to generate a graph in the target domain by conditioning an input graph in the source domain. This promising topic has attracted fast-increasing attention recently. Existing works are limited to either merely predicting the node attributes of graphs with fixed topology or predicting only the graph topology without considering node attributes, but cannot simultaneously predict both of them, due to substantial challenges: 1) difficulty in characterizing the interactive, iterative, and asynchronous translation process of both nodes and edges and 2) difficulty in discovering and maintaining the inherent consistency between the node and edge in predicted graphs. These challenges prevent a generic, end-to-end framework for joint node and edge attributes prediction, which is a need for real-world applications such as malware confinement in IoT networks and structural-to-functional network translation. These real-world applications highly depend on hand-crafting and ad-hoc heuristic models, but cannot sufficiently utilize massive historical data. In this paper, we termed this generic problem "multi-attributed graph translation" and developed a novel framework integrating both node and edge translations seamlessly. The novel edge translation path is generic, which is proven to be a generalization of the existing topology translation models. Then, a spectral graph regularization based on our non-parametric graph Laplacian is proposed in order to learn and maintain the consistency of the predicted nodes and edges. Finally, extensive experiments on both synthetic and real-world application data demonstrated the effectiveness of the proposed method.
CRFeb 18, 2020
DFSSD: Deep Faults and Shallow State Duality, A Provably Strong Obfuscation Solution for Circuits with Restricted Access to Scan ChainShervin Roshanisefat, Hadi Mardani Kamali, Kimia Zamiri Azar et al.
In this paper, we introduce DFSSD, a novel logic locking solution for sequential and FSM circuits with a restricted (locked) access to the scan chain. DFSSD combines two techniques for obfuscation: (1) Deep Faults, and (2) Shallow State Duality. Both techniques are specifically designed to resist against sequential SAT attacks based on bounded model checking. The shallow state duality prevents a sequential SAT attack from taking a shortcut for early termination without running an exhaustive unbounded model checker to assess if the attack could be terminated. The deep fault, on the other hand, provides a designer with a technique for building deep, yet key recoverable faults that could not be discovered by sequential SAT (and bounded model checker based) attacks in a reasonable time.
CRJan 23, 2020
SAT-hard Cyclic Logic Obfuscation for Protecting the IP in the Manufacturing Supply ChainShervin Roshanisefat, Hadi Mardani Kamali, Houman Homayoun et al.
State-of-the-art attacks against cyclic logic obfuscation use satisfiability solvers that are equipped with a set of cycle avoidance clauses. These cycle avoidance clauses are generated in a pre-processing step and define various key combinations that could open or close cycles without making the circuit oscillating or stateful. In this paper, we show that this pre-processing step has to generate cycle avoidance conditions on all cycles in a netlist, otherwise, a missing cycle could trap the solver in an infinite loop or make it exit with an incorrect key. Then, we propose several techniques by which the number of cycles is exponentially increased as a function of the number of inserted feedbacks. We further illustrate that when the number of feedbacks is increased, the pre-processing step of the attack faces an exponential increase in complexity and runtime, preventing the correct composition of cycle avoidance clauses in a reasonable time. On the other hand, if the pre-processing is not concluded, the attack formulated by the satisfiability solver will either get stuck or exit with an incorrect key. Hence, when the cyclic obfuscation under the conditions proposed in this paper is implemented, it would impose an exponentially difficult problem for the satisfiability solver based attacks.
CRJan 17, 2020
LASCA: Learning Assisted Side Channel Delay Analysis for Hardware Trojan DetectionAshkan Vakil, Farnaz Behnia, Ali Mirzaeian et al.
In this paper, we introduce a Learning Assisted Side Channel delay Analysis (LASCA) methodology for Hardware Trojan detection. Our proposed solution, unlike the prior art, does not require a Golden IC. Instead, it trains a Neural Network to act as a process tracking watchdog for correlating the static timing data (produced at design time) to the delay information obtained from clock frequency sweeping (at test time) for the purpose of Trojan detection. Using the LASCA flow, we detect close to 90% of Hardware Trojans in the simulated scenarios.
LGJan 16, 2020
Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial AttacksFarnaz Behnia, Ali Mirzaeian, Mohammad Sabokrou et al.
In this paper, we propose Code-Bridged Classifier (CBC), a framework for making a Convolutional Neural Network (CNNs) robust against adversarial attacks without increasing or even by decreasing the overall models' computational complexity. More specifically, we propose a stacked encoder-convolutional model, in which the input image is first encoded by the encoder module of a denoising auto-encoder, and then the resulting latent representation (without being decoded) is fed to a reduced complexity CNN for image classification. We illustrate that this network not only is more robust to adversarial examples but also has a significantly lower computational complexity when compared to the prior art defenses.
PFOct 14, 2019
TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACsAli Mirzaeian, Houman Homayoun, Avesta Sasan
In this paper, we first propose the design of Temporal-Carry-deferring MAC (TCD-MAC) and illustrate how our proposed solution can gain significant energy and performance benefit when utilized to process a stream of input data. We then propose using the TCD-MAC to build a reconfigurable, high speed, and low power Neural Processing Engine (TCD-NPE). We, further, propose a novel scheduler that lists the sequence of needed processing events to process an MLP model in the least number of computational rounds in our proposed TCD-NPE. We illustrate that our proposed TCD-NPE significantly outperform similar neural processing solutions that use conventional MACs in terms of both energy consumption and execution time.
LGOct 1, 2019
NESTA: Hamming Weight Compression-Based Neural Proc. EngineAli Mirzaeian, Houman Homayoun, Avesta Sasan
In this paper, we present NESTA, a specialized Neural engine that significantly accelerates the computation of convolution layers in a deep convolutional neural network, while reducing the computational energy. NESTA reformats Convolutions into $3 \times 3$ batches and uses a hierarchy of Hamming Weight Compressors to process each batch. Besides, when processing the convolution across multiple channels, NESTA, rather than computing the precise result of a convolution per channel, quickly computes an approximation of its partial sum, and a residual value such that if added to the approximate partial sum, generates the accurate output. Then, instead of immediately adding the residual, it uses (consumes) the residual when processing the next batch in the hamming weight compressors with available capacity. This mechanism shortens the critical path by avoiding the need to propagate carry signals during each round of computation and speeds up the convolution of each channel. In the last stage of computation, when the partial sum of the last channel is computed, NESTA terminates by adding the residual bits to the approximate output to generate a correct result.
CRSep 1, 2019
COMA: Communication and Obfuscation Management ArchitectureKimia Zamiri Azar, Farnoud Farahmand, Hadi Mardani Kamali et al.
In this paper, we introduce a novel Communication and Obfuscation Management Architecture (COMA) to handle the storage of the obfuscation key and to secure the communication to/from untrusted yet obfuscated circuits. COMA addresses three challenges related to the obfuscated circuits: First, it removes the need for the storage of the obfuscation unlock key at the untrusted chip. Second, it implements a mechanism by which the key sent for unlocking an obfuscated circuit changes after each activation (even for the same device), transforming the key into a dynamically changing license. Third, it protects the communication to/from the COMA protected device and additionally introduces two novel mechanisms for the exchange of data to/from COMA protected architectures: (1) a highly secure but slow double encryption, which is used for exchange of key and sensitive data (2) a high-performance and low-energy yet leaky encryption, secured by means of frequent key renewal. We demonstrate that compared to state-of-the-art key management architectures, COMA reduces the area overhead by 14%, while allowing additional features including unique chip authentication, enabling activation as a service (for IoT devices), reducing the side channel threats on key management architecture, and providing two new means of secure communication to/from an untrusted chip.
LGAug 22, 2019
DynGraph2Seq: Dynamic-Graph-to-Sequence Interpretable Learning for Health Stage Prediction in Online Health ForumsYuyang Gao, Lingfei Wu, Houman Homayoun et al.
Online health communities such as the online breast cancer forum enable patients (i.e., users) to interact and help each other within various subforums, which are subsections of the main forum devoted to specific health topics. The changing nature of the users' activities in different subforums can be strong indicators of their health status changes. This additional information could allow health-care organizations to respond promptly and provide additional help for the patient. However, modeling complex transitions of an individual user's activities among different subforums over time and learning how these correspond to his/her health stage are extremely challenging. In this paper, we first formulate the transition of user activities as a dynamic graph with multi-attributed nodes, then formalize the health stage inference task as a dynamic graph-to-sequence learning problem, and hence propose a novel dynamic graph-to-sequence neural networks architecture (DynGraph2Seq) to address all the challenges. Our proposed DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder that learn the mapping between a sequence of time-evolving user activity graphs and a sequence of target health stages. We go on to propose dynamic graph hierarchical attention mechanisms to facilitate the necessary multi-level interpretability. A comprehensive experimental analysis of its use for a health stage prediction task demonstrates both the effectiveness and the interpretability of the proposed models.
ARJul 29, 2019
Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis DesignHosein Mohammadi Makrani, Farnoud Farahmand, Hossein Sayadi et al.
The emergence of High-Level Synthesis (HLS) tools shifted the paradigm of hardware design by making the process of mapping high-level programming languages to hardware design such as C to VHDL/Verilog feasible. HLS tools offer a plethora of techniques to optimize designs for both area and performance, but resource usage and timing reports of HLS tools mostly deviate from the post-implementation results. In addition, to evaluate a hardware design performance, it is critical to determine the maximum achievable clock frequency. Obtaining such information using static timing analysis provided by CAD tools is difficult, due to the multitude of tool options. Moreover, a binary search to find the maximum frequency is tedious, time-consuming, and often does not obtain the optimal result. To address these challenges, we propose a framework, called Pyramid, that uses machine learning to accurately estimate the optimal performance and resource utilization of an HLS design. For this purpose, we first create a database of C-to-FPGA results from a diverse set of benchmarks. To find the achievable maximum clock frequency, we use Minerva, which is an automated hardware optimization tool. Minerva determines the close-to-optimal settings of tools, using static timing analysis and a heuristic algorithm, and targets either optimal throughput or throughput-to-area. Pyramid uses the database to train an ensemble machine learning model to map the HLS-reported features to the results of Minerva. To this end, Pyramid re-calibrates the results of HLS to bridge the accuracy gap and enable developers to estimate the throughput or throughput-to-area of hardware design with more than 95% accuracy and alleviates the need to perform actual implementation for estimation.
LGJul 7, 2019
Resource-Efficient Wearable Computing for Real-Time Reconfigurable Machine Learning: A Cascading Binary ClassificationMahdi Pedram, Seyed Ali Rokni, Marjan Nourollahi et al.
Advances in embedded systems have enabled integration of many lightweight sensory devices within our daily life. In particular, this trend has given rise to continuous expansion of wearable sensors in a broad range of applications from health and fitness monitoring to social networking and military surveillance. Wearables leverage machine learning techniques to profile behavioral routine of their end-users through activity recognition algorithms. Current research assumes that such machine learning algorithms are trained offline. In reality, however, wearables demand continuous reconfiguration of their computational algorithms due to their highly dynamic operation. Developing a personalized and adaptive machine learning model requires real-time reconfiguration of the model. Due to stringent computation and memory constraints of these embedded sensors, the training/re-training of the computational algorithms need to be memory- and computation-efficient. In this paper, we propose a framework, based on the notion of online learning, for real-time and on-device machine learning training. We propose to transform the activity recognition problem from a multi-class classification problem to a hierarchical model of binary decisions using cascading online binary classifiers. Our results, based on Pegasos online learning, demonstrate that the proposed approach achieves 97% accuracy in detecting activities of varying intensities using a limited memory while power usages of the system is reduced by more than 40%.
CRMay 15, 2019
Threats on Logic Locking: A Decade LaterKimia Zamiri Azar, Hadi Mardani Kamali, Houman Homayoun et al.
To reduce the cost of ICs and to meet the market's demand, a considerable portion of manufacturing supply chain, including silicon fabrication, packaging and testing may be pushed offshore. Utilizing a global IC manufacturing supply chain, and inclusion of non-trusted parties in the supply chain has raised concerns over security and trust related challenges including those of overproduction, counterfeiting, IP piracy, and Hardware Trojans to name a few. To reduce the risk of IC manufacturing in an untrusted and globally distributed supply chain, the researchers have proposed various locking and obfuscation mechanisms for hiding the functionality of the ICs during the manufacturing, that requires the activation of the IP after fabrication using the key value(s) that is only known to the IP/IC owner. At the same time, many such proposed obfuscation and locking mechanisms are broken with attacks that exploit the inherent vulnerabilities in such solutions. The past decade of research in this area, has resulted in many such defense and attack solutions. In this paper, we review a decade of research on hardware obfuscation from an attacker perspective, elaborate on attack and defense lessons learned, and discuss future directions that could be exploited for building stronger defenses.
CRFeb 14, 2019
Estimating the Circuit Deobfuscating Runtime based on Graph Deep LearningZhiqian Chen, Gaurav Kolhe, Setareh Rafatirad et al.
Circuit obfuscation is a recently proposed defense mechanism to protect digital integrated circuits (ICs) from reverse engineering by using camouflaged gates i.e., logic gates whose functionality cannot be precisely determined by the attacker. There have been effective schemes such as satisfiability-checking (SAT)-based attacks that can potentially decrypt obfuscated circuits, called deobfuscation. Deobfuscation runtime could have a large span ranging from few milliseconds to thousands of years or more, depending on the number and layouts of the ICs and camouflaged gates. And hence accurately pre-estimating the deobfuscation runtime is highly crucial for the defenders to maximize it and optimize their defense. However, estimating the deobfuscation runtime is a challenging task due to 1) the complexity and heterogeneity of graph-structured circuit, 2) the unknown and sophisticated mechanisms of the attackers for deobfuscation. To address the above mentioned challenges, this work proposes the first machine-learning framework that predicts the deobfuscation runtime based on graph deep learning techniques. Specifically, we design a new model, ICNet with new input and convolution layers to characterize and extract graph frequencies from ICs, which are then integrated by heterogeneous deep fully-connected layers to obtain final output. ICNet is an end-to-end framework which can automatically extract the determinant features for deobfuscation runtime. Extensive experiments demonstrate its effectiveness and efficiency.
CRApr 30, 2018
Benchmarking the Capabilities and Limitations of SAT Solvers in Defeating Obfuscation SchemesShervin Roshanisefat, Harshith K. Thirumala, Kris Gaj et al.
In this paper, we investigate the strength of six different SAT solvers in attacking various obfuscation schemes. Our investigation revealed that Glucose and Lingeling SAT solvers are generally suited for attacking small-to-midsize obfuscated circuits, while the MapleGlucose, if the system is not memory bound, is best suited for attacking mid-to-difficult obfuscation methods. Our experimental result indicates that when dealing with extremely large circuits and very difficult obfuscation problems, the SAT solver may be memory bound, and Lingeling, for having the most memory efficient implementation, is the best-suited solver for such problems. Additionally, our investigation revealed that SAT solver execution times may vary widely across different SAT solvers. Hence, when testing the hardness of an obfuscation method, although the increase in difficulty could be verified by one SAT solver, the pace of increase in difficulty is dependent on the choice of a SAT solver.
CRApr 30, 2018
LUT-Lock: A Novel LUT-based Logic Obfuscation for FPGA-Bitstream and ASIC-Hardware ProtectionHadi Mardani Kamali, Kimia Zamiri Azar, Kris Gaj et al.
In this work, we propose LUT-Lock, a novel Look-Up-Table-based netlist obfuscation algorithm, for protecting the intellectual property that is mapped to an FPGA bitstream or an ASIC netlist. We, first, illustrate the effectiveness of several key features that make the LUT-based obfuscation more resilient against SAT attacks and then we embed the proposed key features into our proposed LUT-Lock algorithm. We illustrates that LUT-Lock maximizes the resiliency of the LUT-based obfuscation against SAT attacks by forcing a near exponential increase in the execution time of a SAT solver with respect to the number of obfuscated gates. Hence, by adopting LUT-Lock algorithm, SAT attack execution time could be made unreasonably long by increasing the number of utilized LUTs.