LGMay 7, 2022Code
Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical GuaranteesBinghui Wang, Youqi Li, Pan Zhou
Graph neural networks (GNNs) have achieved state-of-the-art performance in many graph-based tasks such as node classification and graph classification. However, many recent works have demonstrated that an attacker can mislead GNN models by slightly perturbing the graph structure. Existing attacks to GNNs are either under the less practical threat model where the attacker is assumed to access the GNN model parameters, or under the practical black-box threat model but consider perturbing node features that are shown to be not enough effective. In this paper, we aim to bridge this gap and consider black-box attacks to GNNs with structure perturbation as well as with theoretical guarantees. We propose to address this challenge through bandit techniques. Specifically, we formulate our attack as an online optimization with bandit feedback. This original problem is essentially NP-hard due to the fact that perturbing the graph structure is a binary optimization problem. We then propose an online attack based on bandit optimization which is proven to be {sublinear} to the query number $T$, i.e., $\mathcal{O}(\sqrt{N}T^{3/4})$ where $N$ is the number of nodes in the graph. Finally, we evaluate our proposed attack by conducting experiments over multiple datasets and GNN models. The experimental results on various citation graphs and image graphs show that our attack is both effective and efficient. Source code is available at~\url{https://github.com/Metaoblivion/Bandit_GNN_Attack}
LGApr 10, 2023Code
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceHanbin Hong, Xinyu Zhang, Binghui Wang et al.
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models by iteratively querying the target model or leveraging transferability from a local surrogate model. Recently, such attacks can be effectively mitigated by state-of-the-art (SOTA) defenses, e.g., detection via the pattern of sequential queries, or injecting noise into the model. To our best knowledge, we take the first step to study a new paradigm of black-box attacks with provable guarantees -- certifiable black-box attacks that can guarantee the attack success probability (ASP) of adversarial examples before querying over the target model. This new black-box attack unveils significant vulnerabilities of machine learning models, compared to traditional empirical black-box attacks, e.g., breaking strong SOTA defenses with provable confidence, constructing a space of (infinite) adversarial examples with high ASP, and the ASP of the generated adversarial examples is theoretically guaranteed without verification/queries over the target model. Specifically, we establish a novel theoretical foundation for ensuring the ASP of the black-box attack with randomized adversarial examples (AEs). Then, we propose several novel techniques to craft the randomized AEs while reducing the perturbation size for better imperceptibility. Finally, we have comprehensively evaluated the certifiable black-box attacks on the CIFAR10/100, ImageNet, and LibriSpeech datasets, while benchmarking with 16 SOTA black-box attacks, against various SOTA defenses in the domains of computer vision and speech recognition. Both theoretical and experimental results have validated the significance of the proposed attack. The code and all the benchmarks are available at \url{https://github.com/datasec-lab/CertifiedAttack}.
CRJul 31, 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial AttacksXinyu Zhang, Hanbin Hong, Yuan Hong et al.
The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
CVMar 24, 2023
IDGI: A Framework to Eliminate Explanation Noise from Integrated GradientsRuo Yang, Binghui Wang, Mustafa Bilgic
Integrated Gradients (IG) as well as its variants are well-known techniques for interpreting the decisions of deep neural networks. While IG-based approaches attain state-of-the-art performance, they often integrate noise into their explanation saliency maps, which reduce their interpretability. To minimize the noise, we examine the source of the noise analytically and propose a new approach to reduce the explanation noise based on our analytical findings. We propose the Important Direction Gradient Integration (IDGI) framework, which can be easily incorporated into any IG-based method that uses the Reimann Integration for integrated gradient computation. Extensive experiments with three IG-based methods show that IDGI improves them drastically on numerous interpretability metrics.
LGJul 5, 2022
UniCR: Universally Approximated Certified Robustness via Randomized SmoothingHanbin Hong, Binghui Wang, Yuan Hong
We study certified robustness of machine learning classifiers against adversarial perturbations. In particular, we propose the first universally approximated certified robustness (UniCR) framework, which can approximate the robustness certification of any input on any classifier against any $\ell_p$ perturbations with noise generated by any continuous probability distribution. Compared with the state-of-the-art certified defenses, UniCR provides many significant benefits: (1) the first universal robustness certification framework for the above 4 'any's; (2) automatic robustness certification that avoids case-by-case analysis, (3) tightness validation of certified robustness, and (4) optimality validation of noise distributions used by randomized smoothing. We conduct extensive experiments to validate the above benefits of UniCR and the advantages of UniCR over state-of-the-art certified defenses against $\ell_p$ perturbations.
CRJun 11, 2022
NeuGuard: Lightweight Neuron-Guided Defense against Membership Inference AttacksNuo Xu, Binghui Wang, Ran Ran et al.
Membership inference attacks (MIAs) against machine learning models can lead to serious privacy risks for the training dataset used in the model training. In this paper, we propose a novel and effective Neuron-Guided Defense method named NeuGuard against membership inference attacks (MIAs). We identify a key weakness in existing defense mechanisms against MIAs wherein they cannot simultaneously defend against two commonly used neural network based MIAs, indicating that these two attacks should be separately evaluated to assure the defense effectiveness. We propose NeuGuard, a new defense approach that jointly controls the output and inner neurons' activation with the object to guide the model output of training set and testing set to have close distributions. NeuGuard consists of class-wise variance minimization targeting restricting the final output neurons and layer-wise balanced output control aiming to constrain the inner neurons in each layer. We evaluate NeuGuard and compare it with state-of-the-art defenses against two neural network based MIAs, five strongest metric based MIAs including the newly proposed label-only MIA on three benchmark datasets. Results show that NeuGuard outperforms the state-of-the-art defenses by offering much improved utility-privacy trade-off, generality, and overhead.
CVApr 5, 2023
A Certified Radius-Guided Attack Framework to Image Segmentation ModelsWenjie Qu, Youqi Li, Binghui Wang
Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.
CRMar 3Code
On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical ValidationRomina Omidi, Yun Dong, Binghui Wang
Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. The system's innovation lies in: 1) a new Tournament sampling algorithm for watermarking embedding, 2) a detection strategy based on the introduced score function (e.g., Bayesian or mean score), and 3) a unified design that supports both distortionary and non-distortionary watermarking methods. This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. For example, we prove that the mean score is inherently vulnerable to increased tournament layers, and design a layer inflation attack to break SynthID-Text. We also prove the Bayesian score offers improved watermark robustness w.r.t. layers and further establish that the optimal Bernoulli distribution for watermark detection is achieved when the parameter is set to 0.5. Together, these theoretical and empirical insights not only deepen our understanding of SynthID-Text, but also open new avenues for analyzing effective watermark removal strategies and designing robust watermarking techniques. Source code is available at https: //github.com/romidi80/Synth-ID-Empirical-Analysis.
AIMar 18
Contrastive Reasoning Alignment: Reinforcement Learning from Hidden RepresentationsHaozheng Luo, Yimin Wang, Jiahao Yu et al.
We propose CRAFT, a red-teaming alignment framework that leverages model reasoning capabilities and hidden representations to improve robustness against jailbreak attacks. Unlike prior defenses that operate primarily at the output level, CRAFT aligns large reasoning models to generate safety-aware reasoning traces by explicitly optimizing objectives defined over the hidden state space. Methodologically, CRAFT integrates contrastive representation learning with reinforcement learning to separate safe and unsafe reasoning trajectories, yielding a latent-space geometry that supports robust, reasoning-level safety alignment. Theoretically, we show that incorporating latent-textual consistency into GRPO eliminates superficially aligned policies by ruling them out as local optima. Empirically, we evaluate CRAFT on multiple safety benchmarks using two strong reasoning models, Qwen3-4B-Thinking and R1-Distill-Llama-8B, where it consistently outperforms state-of-the-art defenses such as IPO and SafeKey. Notably, CRAFT delivers an average 79.0% improvement in reasoning safety and 87.7% improvement in final-response safety over the base models, demonstrating the effectiveness of hidden-space reasoning alignment.
LGJul 20, 2024
Universally Harmonizing Differential Privacy Mechanisms for Federated Learning: Boosting Accuracy and ConvergenceShuya Feng, Meisam Mohammady, Hanbin Hong et al.
Differentially private federated learning (DP-FL) is a promising technique for collaborative model training while ensuring provable privacy for clients. However, optimizing the tradeoff between privacy and accuracy remains a critical challenge. To our best knowledge, we propose the first DP-FL framework (namely UDP-FL), which universally harmonizes any randomization mechanism (e.g., an optimal one) with the Gaussian Moments Accountant (viz. DP-SGD) to significantly boost accuracy and convergence. Specifically, UDP-FL demonstrates enhanced model performance by mitigating the reliance on Gaussian noise. The key mediator variable in this transformation is the Rényi Differential Privacy notion, which is carefully used to harmonize privacy budgets. We also propose an innovative method to theoretically analyze the convergence for DP-FL (including our UDP-FL ) based on mode connectivity analysis. Moreover, we evaluate our UDP-FL through extensive experiments benchmarked against state-of-the-art (SOTA) methods, demonstrating superior performance on both privacy guarantees and model performance. Notably, UDP-FL exhibits substantial resilience against different inference attacks, indicating a significant advance in safeguarding sensitive data in federated learning environments.
CRMay 18
MoCo-EA: Exploiting Adversarial Mode Connectivity for Efficient Evolutionary AttacksHyo Seo Kim, Gang Luo, Can Chen et al.
Evolutionary algorithms for adversarial attacks leverage population-based search to discover perturbations without gradient information, but suffer from inefficient crossover operations that destroy adversarial properties through discrete interpolation. We introduce Mode Connectivity Evolutionary Attack (MoCo-EA), which replaces traditional crossover with a novel Bézier crossover operator that optimizes perturbations along a continuous Bézier curve between parent perturbations. Our key insight is that adversarial examples lie on connected manifolds where intermediate points maintain and often enhance attack effectiveness. We demonstrate three findings: (1) Successful adversarial perturbations exhibit mode connectivity; (2) Intermediate points along optimized paths achieve higher transferability than endpoints; (3) Bézier crossover dramatically outperforms discrete genetic operations while reducing convergence time and query requirements. By exploiting the geometric structure of adversarial space through path optimization, MoCo-EA provides an efficient and reliable method. Our work challenges the traditional view of adversarial examples as isolated points and opens new directions for both attack generation and defense research.
LGMay 18
Attention Sinks and Outliers in Attention ResidualsHaozheng Luo, Haoran Dai, Shaoyang Zhang et al.
We propose OASIS, an outlier- and sink-aware technique built on inter-layer null signaling. As AttnResidual architectures introduce an additional depth-wise normalization channel, they improve inter-layer routing flexibility but also exacerbate attention sinks, activation outliers, and the resulting degradation in inference stability and quantization robustness. OASIS addresses this issue by introducing a Softmax1-based null space and coupling token-level null evidence to depth routing through an inter-layer null signal, thereby reducing sink-dominated routing and improving structural robustness. Theoretically, we show that the dual-normalization design of AttnResidual intensifies sink formation and quantization brittleness. Experimentally, we compare OASIS against five baselines on three real-world datasets and observe consistent improvements in both attention sink and post-quantization performance. Notably, OASIS achieves an average reduction of 9.26% in maximum infinity norm and 2.60% in average kurtosis across the evaluated settings, while lowering perplexity by 75.85% under W8A8 and improving GSM8K Pass@1 by 12.42% under W4A4.
CRAug 22, 2024
Understanding Data Reconstruction Leakage in Federated Learning from a Theoretical PerspectiveZifan Wang, Binghui Zhang, Meng Pang et al.
Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks. However, existing works lack a theoretical foundation on to what extent the devices' data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their unstable performance. To address this deficiency, we propose a theoretical framework to understand data reconstruction attacks to FL. Our framework involves bounding the data reconstruction error and an attack's error bound reflects its inherent attack effectiveness. Under the framework, we can theoretically compare the effectiveness of existing attacks. For instance, our results on multiple datasets validate that the iDLG attack inherently outperforms the DLG attack.
LGOct 22, 2025Code
Towards Strong Certified Defense with Universal Asymmetric RandomizationHanbin Hong, Ashish Kundu, Ali Payani et al.
Randomized smoothing has become essential for achieving certified adversarial robustness in machine learning models. However, current methods primarily use isotropic noise distributions that are uniform across all data dimensions, such as image pixels, limiting the effectiveness of robustness certification by ignoring the heterogeneity of inputs and data dimensions. To address this limitation, we propose UCAN: a novel technique that \underline{U}niversally \underline{C}ertifies adversarial robustness with \underline{A}nisotropic \underline{N}oise. UCAN is designed to enhance any existing randomized smoothing method, transforming it from symmetric (isotropic) to asymmetric (anisotropic) noise distributions, thereby offering a more tailored defense against adversarial attacks. Our theoretical framework is versatile, supporting a wide array of noise distributions for certified robustness in different $\ell_p$-norms and applicable to any arbitrary classifier by guaranteeing the classifier's prediction over perturbed inputs with provable robustness bounds through tailored noise injection. Additionally, we develop a novel framework equipped with three exemplary noise parameter generators (NPGs) to optimally fine-tune the anisotropic noise parameters for different data dimensions, allowing for pursuing different levels of robustness enhancements in practice.Empirical evaluations underscore the significant leap in UCAN's performance over existing state-of-the-art methods, demonstrating up to $182.6\%$ improvement in certified accuracy at large certified radii on MNIST, CIFAR10, and ImageNet datasets.\footnote{Code is anonymously available at \href{https://github.com/youbin2014/UCAN/}{https://github.com/youbin2014/UCAN/}}
CRSep 1, 2020Code
Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence FunctionBinghui Wang, Tianxiang Zhou, Minhua Lin et al.
Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks, where an attacker slightly perturbing the graph structure can fool trained GNN models. Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters. We address the above drawbacks and propose an influence-based \emph{efficient, direct, and restricted black-box} evasion attack to \emph{any-layer} GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then we observe that GNNs and LP are strongly connected in terms of our defined influences. Based on this, we can then reformulate the evasion attack to GNNs as calculating label influence on LP, which is \emph{inherently} applicable to any-layer GNNs, while no need to know information about the internal GNN model. Finally, we propose an efficient algorithm to calculate label influence. Experimental results on various graph datasets show that, compared to state-of-the-art white-box attacks, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs\footnote{Source code and full version is in the link: \url{https://github.com/ventr1c/InfAttack}}.
LGDec 20, 2019Code
Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized SmoothingJinyuan Jia, Xiaoyu Cao, Binghui Wang et al.
It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-$k$ predictions are more relevant. In this work, we aim to derive certified robustness for top-$k$ predictions. In particular, our certified robustness is based on randomized smoothing, which turns any classifier to a new classifier via adding noise to an input example. We adopt randomized smoothing because it is scalable to large-scale neural networks and applicable to any classifier. We derive a tight robustness in $\ell_2$ norm for top-$k$ predictions when using randomized smoothing with Gaussian noise. We find that generalizing the certified robustness from top-1 to top-$k$ predictions faces significant technical challenges. We also empirically evaluate our method on CIFAR10 and ImageNet. For example, our method can obtain an ImageNet classifier with a certified top-5 accuracy of 62.8\% when the $\ell_2$-norms of the adversarial perturbations are less than 0.5 (=127/255). Our code is publicly available at: \url{https://github.com/jjy1994/Certify_Topk}.
CRFeb 12, 2024
PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language ModelsWei Zou, Runpeng Geng, Binghui Wang et al.
Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. The key idea of RAG is to ground the answer generation of an LLM on external knowledge retrieved from a knowledge database. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG, where an attacker could inject a few malicious texts into the knowledge database of a RAG system to induce an LLM to generate an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge corruption attacks as an optimization problem, whose solution is a set of malicious texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on a RAG system, we propose two solutions to solve the optimization problem, respectively. Our results show PoisonedRAG could achieve a 90% attack success rate when injecting five malicious texts for each target question into a knowledge database with millions of texts. We also evaluate several defenses and our results show they are insufficient to defend against PoisonedRAG, highlighting the need for new defenses.
CVSep 24, 2024
Leveraging Local Structure for Improving Model Explanations: An Information Propagation ApproachRuo Yang, Binghui Wang, Mustafa Bilgic
Numerous explanation methods have been recently developed to interpret the decisions made by deep neural network (DNN) models. For image classifiers, these methods typically provide an attribution score to each pixel in the image to quantify its contribution to the prediction. However, most of these explanation methods appropriate attribution scores to pixels independently, even though both humans and DNNs make decisions by analyzing a set of closely related pixels simultaneously. Hence, the attribution score of a pixel should be evaluated jointly by considering itself and its structurally-similar pixels. We propose a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels. To formulate the information propagation, IProp adopts the Markov Reward Process, which guarantees convergence, and the final status indicates the desired pixels' attribution scores. Furthermore, IProp is compatible with any existing attribution-based explanation method. Extensive experiments on various explanation methods and DNN models verify that IProp significantly improves them on a variety of interpretability metrics.
LGJul 12, 2024
Graph Neural Network Causal Explanation via Neural Causal ModelsArman Behnam, Binghui Wang
Graph neural network (GNN) explainers identify the important subgraph that ensures the prediction for a given graph. Until now, almost all GNN explainers are based on association, which is prone to spurious correlations. We propose {\name}, a GNN causal explainer via causal inference. Our explainer is based on the observation that a graph often consists of a causal underlying subgraph. {\name} includes three main steps: 1) It builds causal structure and the corresponding structural causal model (SCM) for a graph, which enables the cause-effect calculation among nodes. 2) Directly calculating the cause-effect in real-world graphs is computationally challenging. It is then enlightened by the recent neural causal model (NCM), a special type of SCM that is trainable, and design customized NCMs for GNNs. By training these GNN NCMs, the cause-effect can be easily calculated. 3) It uncovers the subgraph that causally explains the GNN predictions via the optimized GNN-NCMs. Evaluation results on multiple synthetic and real-world graphs validate that {\name} significantly outperforms existing GNN explainers in exact groundtruth explanation identification
LGMar 4, 2024
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference AttacksSayedeh Leila Noorbakhsh, Binghui Zhang, Yuan Hong et al.
Machine learning (ML) is vulnerable to inference (e.g., membership inference, property inference, and data reconstruction) attacks that aim to infer the private information of training data or dataset. Existing defenses are only designed for one specific type of attack and sacrifice significant utility or are soon broken by adaptive attacks. We address these limitations by proposing an information-theoretic defense framework, called Inf2Guard, against the three major types of inference attacks. Our framework, inspired by the success of representation learning, posits that learning shared representations not only saves time/costs but also benefits numerous downstream tasks. Generally, Inf2Guard involves two mutual information objectives, for privacy protection and utility preservation, respectively. Inf2Guard exhibits many merits: it facilitates the design of customized objectives against the specific inference attack; it provides a general defense framework which can treat certain existing defenses as special cases; and importantly, it aids in deriving theoretical results, e.g., inherent utility-privacy tradeoff and guaranteed privacy leakage. Extensive evaluations validate the effectiveness of Inf2Guard for learning privacy-preserving representations against inference attacks and demonstrate the superiority over the baselines.
LGMar 26, 2024
Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel MetricsJane Downer, Ren Wang, Binghui Wang
Graph Neural Networks (GNNs) have gained popularity in numerous domains, yet they are vulnerable to backdoor attacks that can compromise their performance and ethical application. The detection of these attacks is crucial for maintaining the reliability and security of GNN classification tasks, but effective detection techniques are lacking. Recognizing the challenge in detecting such intrusions, we devised a novel detection method that creatively leverages graph-level explanations. By extracting and transforming secondary outputs from GNN explanation mechanisms, we developed seven innovative metrics for effective detection of backdoor attacks on GNNs. Additionally, we develop an adaptive attack to rigorously evaluate our approach. We test our method on multiple benchmark datasets and examine its efficacy against various attack models. Our results show that our method can achieve high detection performance, marking a significant advancement in safeguarding GNNs against backdoor attacks.
CRDec 17, 2024
Practicable Black-box Evasion Attacks on Link Prediction in Dynamic Graphs -- A Graph Sequential Embedding MethodJiate Li, Meng Pang, Binghui Wang
Link prediction in dynamic graphs (LPDG) has been widely applied to real-world applications such as website recommendation, traffic flow prediction, organizational studies, etc. These models are usually kept local and secure, with only the interactive interface restrictively available to the public. Thus, the problem of the black-box evasion attack on the LPDG model, where model interactions and data perturbations are restricted, seems to be essential and meaningful in practice. In this paper, we propose the first practicable black-box evasion attack method that achieves effective attacks against the target LPDG model, within a limited amount of interactions and perturbations. To perform effective attacks under limited perturbations, we develop a graph sequential embedding model to find the desired state embedding of the dynamic graph sequences, under a deep reinforcement learning framework. To overcome the scarcity of interactions, we design a multi-environment training pipeline and train our agent for multiple instances, by sharing an aggregate interaction buffer. Finally, we evaluate our attack against three advanced LPDG models on three real-world graph datasets of different scales and compare its performance with related methods under the interaction and perturbation constraints. Experimental results show that our attack is both effective and practicable.
LGDec 15, 2024
Learning Robust and Privacy-Preserving Representations via Information TheoryBinghui Zhang, Sayedeh Leila Noorbakhsh, Yun Dong et al.
Machine learning models are vulnerable to both security attacks (e.g., adversarial examples) and privacy attacks (e.g., private attribute inference). We take the first step to mitigate both the security and privacy attacks, and maintain task utility as well. Particularly, we propose an information-theoretic framework to achieve the goals through the lens of representation learning, i.e., learning representations that are robust to both adversarial examples and attribute inference adversaries. We also derive novel theoretical results under our framework, e.g., the inherent trade-off between adversarial robustness/utility and attribute privacy, and guaranteed attribute privacy leakage against attribute inference adversaries.
CRJan 20
SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action ModelsBingxin Xu, Yuzhang Shang, Binghui Wang et al.
Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems: the combination of action chunking and delta pose representations creates an intra-chunk visual open-loop. This mechanism forces the robot to execute K-step action sequences, allowing per-step perturbations to accumulate through integration. We propose SILENTDRIFT, a stealthy black-box backdoor attack exploiting this vulnerability. Our method employs the Smootherstep function to construct perturbations with guaranteed C2 continuity, ensuring zero velocity and acceleration at trajectory boundaries to satisfy strict kinematic consistency constraints. Furthermore, our keyframe attack strategy selectively poisons only the critical approach phase, maximizing impact while minimizing trigger exposure. The resulting poisoned trajectories are visually indistinguishable from successful demonstrations. Evaluated on the LIBERO, SILENTDRIFT achieves a 93.2% Attack Success Rate with a poisoning rate under 2%, while maintaining a 95.3% Clean Task Success Rate.
CRJun 16, 2025
Rectifying Privacy and Efficacy Measurements in Machine Unlearning: A New Inference Attack PerspectiveNima Naderloui, Shenao Yan, Binghui Wang et al.
Machine unlearning focuses on efficiently removing specific data from trained models, addressing privacy and compliance concerns with reasonable costs. Although exact unlearning ensures complete data removal equivalent to retraining, it is impractical for large-scale models, leading to growing interest in inexact unlearning methods. However, the lack of formal guarantees in these methods necessitates the need for robust evaluation frameworks to assess their privacy and effectiveness. In this work, we first identify several key pitfalls of the existing unlearning evaluation frameworks, e.g., focusing on average-case evaluation or targeting random samples for evaluation, incomplete comparisons with the retraining baseline. Then, we propose RULI (Rectified Unlearning Evaluation Framework via Likelihood Inference), a novel framework to address critical gaps in the evaluation of inexact unlearning methods. RULI introduces a dual-objective attack to measure both unlearning efficacy and privacy risks at a per-sample granularity. Our findings reveal significant vulnerabilities in state-of-the-art unlearning methods, where RULI achieves higher attack success rates, exposing privacy risks underestimated by existing methods. Built on a game-based foundation and validated through empirical evaluations on both image and text data (spanning tasks from classification to generation), RULI provides a rigorous, scalable, and fine-grained methodology for evaluating unlearning techniques.
LGMar 24, 2025
Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary PerturbationsJiate Li, Meng Pang, Yun Dong et al.
Graph neural networks (GNNs) are becoming the de facto method to learn on the graph data and have achieved the state-of-the-art on node and graph classification tasks. However, recent works show GNNs are vulnerable to training-time poisoning attacks -- marginally perturbing edges, nodes, or/and node features of training graph(s) can largely degrade GNNs' testing performance. Most previous defenses against graph poisoning attacks are empirical and are soon broken by adaptive / stronger ones. A few provable defenses provide robustness guarantees, but have large gaps when applied in practice: 1) restrict the attacker on only one type of perturbation; 2) design for a particular GNN architecture or task; and 3) robustness guarantees are not 100\% accurate. In this work, we bridge all these gaps by developing PGNNCert, the first certified defense of GNNs against poisoning attacks under arbitrary (edge, node, and node feature) perturbations with deterministic robustness guarantees. Extensive evaluations on multiple node and graph classification datasets and GNNs demonstrate the effectiveness of PGNNCert to provably defend against arbitrary poisoning perturbations. PGNNCert is also shown to significantly outperform the state-of-the-art certified defenses against edge perturbation or node perturbation during GNN training.
CRJan 9, 2025
Watermarking Graph Neural Networks via Explanations for Ownership ProtectionJane Downer, Ren Wang, Binghui Wang
Graph Neural Networks (GNNs) are the mainstream method to learn pervasive graph data and are widely deployed in industry, making their intellectual property valuable. However, protecting GNNs from unauthorized use remains a challenge. Watermarking, which embeds ownership information into a model, is a potential solution. However, existing watermarking methods have two key limitations: First, almost all of them focus on non-graph data, with watermarking GNNs for complex graph data largely unexplored. Second, the de facto backdoor-based watermarking methods pollute training data and induce ownership ambiguity through intentional misclassification. Our explanation-based watermarking inherits the strengths of backdoor-based methods (e.g., robust to watermark removal attacks), but avoids data pollution and eliminates intentional misclassification. In particular, our method learns to embed the watermark in GNN explanations such that this unique watermark is statistically distinct from other potential solutions, and ownership claims must show statistical significance to be verified. We theoretically prove that, even with full knowledge of our method, locating the watermark is an NP-hard problem. Empirically, our method manifests robustness to removal attacks like fine-tuning and pruning. By addressing these challenges, our approach marks a significant advancement in protecting GNN intellectual property.
CRMar 8, 2025
Backdoor Attacks on Discrete Graph Diffusion ModelsJiawen Wang, Samin Karim, Yuan Hong et al.
Diffusion models are powerful generative models in continuous data domains such as image and video data. Discrete graph diffusion models (DGDMs) have recently extended them for graph generation, which are crucial in fields like molecule and protein modeling, and obtained the SOTA performance. However, it is risky to deploy DGDMs for safety-critical applications (e.g., drug discovery) without understanding their security vulnerabilities. In this work, we perform the first study on graph diffusion models against backdoor attacks, a severe attack that manipulates both the training and inference/generation phases in graph diffusion models. We first define the threat model, under which we design the attack such that the backdoored graph diffusion model can generate 1) high-quality graphs without backdoor activation, 2) effective, stealthy, and persistent backdoored graphs with backdoor activation, and 3) graphs that are permutation invariant and exchangeable--two core properties in graph generative models. 1) and 2) are validated via empirical evaluations without and with backdoor defenses, while 3) is validated via theoretical results.
LGMar 6
When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion ModelsQitong Wang, Haoran Dai, Haotian Zhang et al.
While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal diffusion models, it is natural to expect that attacking multiple modalities simultaneously (e.g., text and image) would yield complementary effects and strengthen the overall backdoor. In this paper, we challenge this assumption by investigating the phenomenon of Backdoor Modality Collapse, a scenario where the backdoor mechanism degenerates to rely predominantly on a subset of modalities, rendering others redundant. To rigorously quantify this behavior, we introduce two novel metrics: Trigger Modality Attribution (TMA) and Cross-Trigger Interaction (CTI). Through extensive experiments across diverse training configurations in multimodal conditional diffusion, we consistently observe a ``winner-takes-all'' dynamic in backdoor behavior. Our results reveal that (1) attacks often collapse into subset-modality dominance, and (2) cross-modal interaction is negligible or even negative, contradicting the intuition of synergistic vulnerability. These findings highlight a critical blind spot in current assessments, suggesting that high attack success rates often mask a fundamental reliance on a subset of modalities. This establishes a principled foundation for mechanistic analysis and future defense development.
LGOct 16, 2025
Measure-Theoretic Anti-Causal Representation LearningArman Behnam, Binghui Wang
Causal representation learning in the anti-causal setting (labels cause features rather than the reverse) presents unique challenges requiring specialized approaches. We propose Anti-Causal Invariant Abstractions (ACIA), a novel measure-theoretic framework for anti-causal representation learning. ACIA employs a two-level design, low-level representations capture how labels generate observations, while high-level representations learn stable causal patterns across environment-specific variations. ACIA addresses key limitations of existing approaches by accommodating prefect and imperfect interventions through interventional kernels, eliminating dependency on explicit causal structures, handling high-dimensional data effectively, and providing theoretical guarantees for out-of-distribution generalization. Experiments on synthetic and real-world medical datasets demonstrate that ACIA consistently outperforms state-of-the-art methods in both accuracy and invariance metrics. Furthermore, our theoretical results establish tight bounds on performance gaps between training and unseen environments, confirming the efficacy of our approach for robust anti-causal learning.
SEJul 3, 2025
VeFIA: An Efficient Inference Auditing Framework for Vertical Federated Collaborative SoftwareChung-ju Huang, Ziqi Zhang, Yinggui Wang et al.
Vertical Federated Learning (VFL) is a distributed AI software deployment mechanism for cross-silo collaboration without accessing participants' data. However, existing VFL work lacks a mechanism to audit the execution correctness of the inference software of the data party. To address this problem, we design a Vertical Federated Inference Auditing (VeFIA) framework. VeFIA helps the task party to audit whether the data party's inference software is executed as expected during large-scale inference without leaking the data privacy of the data party or introducing additional latency to the inference system. The core of VeFIA is that the task party can use the inference results from a framework with Trusted Execution Environments (TEE) and the coordinator to validate the correctness of the data party's computation results. VeFIA guarantees that, as long as the abnormal inference exceeds 5.4%, the task party can detect execution anomalies in the inference software with a probability of 99.99%, without incurring any additional online inference latency. VeFIA's random sampling validation achieves 100% positive predictive value, negative predictive value, and true positive rate in detecting abnormal inference. To the best of our knowledge, this is the first paper to discuss the correctness of inference software execution in VFL.
LGMar 15, 2025
FedTilt: Towards Multi-Level Fairness-Preserving and Robust Federated LearningBinghui Zhang, Luis Mares De La Cruz, Binghui Wang
Federated Learning (FL) is an emerging decentralized learning paradigm that can partly address the privacy concern that cannot be handled by traditional centralized and distributed learning. Further, to make FL practical, it is also necessary to consider constraints such as fairness and robustness. However, existing robust FL methods often produce unfair models, and existing fair FL methods only consider one-level (client) fairness and are not robust to persistent outliers (i.e., injected outliers into each training round) that are common in real-world FL settings. We propose \texttt{FedTilt}, a novel FL that can preserve multi-level fairness and be robust to outliers. In particular, we consider two common levels of fairness, i.e., \emph{client fairness} -- uniformity of performance across clients, and \emph{client data fairness} -- uniformity of performance across different classes of data within a client. \texttt{FedTilt} is inspired by the recently proposed tilted empirical risk minimization, which introduces tilt hyperparameters that can be flexibly tuned. Theoretically, we show how tuning tilt values can achieve the two-level fairness and mitigate the persistent outliers, and derive the convergence condition of \texttt{FedTilt} as well. Empirically, our evaluation results on a suite of realistic federated datasets in diverse settings show the effectiveness and flexibility of the \texttt{FedTilt} framework and the superiority to the state-of-the-arts.
CRJun 5, 2024
Graph Neural Network Explanations are FragileJiate Li, Meng Pang, Yun Dong et al.
Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.
CLOct 15, 2021
Detecting Gender Bias in Transformer-based Models: A Case Study on BERTBingbing Li, Hongwu Peng, Rajat Sainju et al.
In this paper, we propose a novel gender bias detection method by utilizing attention map for transformer-based models. We 1) give an intuitive gender bias judgement method by comparing the different relation degree between the genders and the occupation according to the attention scores, 2) design a gender bias detector by modifying the attention module, 3) insert the gender bias detector into different positions of the model to present the internal gender bias flow, and 4) draw the consistent gender bias conclusion by scanning the entire Wikipedia, a BERT pretraining dataset. We observe that 1) the attention matrices, Wq and Wk introduce much more gender bias than other modules (including the embedding layer) and 2) the bias degree changes periodically inside of the model (attention matrix Q, K, V, and the remaining part of the attention layer (including the fully-connected layer, the residual connection, and the layer normalization module) enhance the gender bias while the averaged attentions reduces the bias).
LGAug 21, 2021
A Hard Label Black-box Adversarial Attack Against Graph Neural NetworksJiaming Mu, Binghui Wang, Qi Li et al.
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph structure related tasks such as node classification and graph classification. However, GNNs are vulnerable to adversarial attacks. Existing works mainly focus on attacking GNNs for node classification; nevertheless, the attacks against GNNs for graph classification have not been well explored. In this work, we conduct a systematic study on adversarial attacks against GNNs for graph classification via perturbing the graph structure. In particular, we focus on the most challenging attack, i.e., hard label black-box attack, where an attacker has no knowledge about the target GNN model and can only obtain predicted labels through querying the target model.To achieve this goal, we formulate our attack as an optimization problem, whose objective is to minimize the number of edges to be perturbed in a graph while maintaining the high attack success rate. The original optimization problem is intractable to solve, and we relax the optimization problem to be a tractable one, which is solved with theoretical convergence guarantee. We also design a coarse-grained searching algorithm and a query-efficient gradient computation algorithm to decrease the number of queries to the target GNN model. Our experimental results on three real-world datasets demonstrate that our attack can effectively attack representative GNNs for graph classification with less queries and perturbations. We also evaluate the effectiveness of our attack under two defenses: one is well-designed adversarial graph detector and the other is that the target GNN model itself is equipped with a defense to prevent adversarial graph generation. Our experimental results show that such defenses are not effective enough, which highlights more advanced defenses.
LGJul 3, 2021
Privacy-Preserving Representation Learning on Graphs: A Mutual Information PerspectiveBinghui Wang, Jiayi Guo, Ang Li et al.
Learning with graphs has attracted significant attention recently. Existing representation learning methods on graphs have achieved state-of-the-art performance on various graph-related tasks such as node classification, link prediction, etc. However, we observe that these methods could leak serious private information. For instance, one can accurately infer the links (or node identity) in a graph from a node classifier (or link predictor) trained on the learnt node representations by existing methods. To address the issue, we propose a privacy-preserving representation learning framework on graphs from the \emph{mutual information} perspective. Specifically, our framework includes a primary learning task and a privacy protection task, and we consider node classification and link prediction as the two tasks of interest. Our goal is to learn node representations such that they can be used to achieve high performance for the primary learning task, while obtaining performance for the privacy protection task close to random guessing. We formally formulate our goal via mutual information objectives. However, it is intractable to compute mutual information in practice. Then, we derive tractable variational bounds for the mutual information terms, where each bound can be parameterized via a neural network. Next, we train these parameterized neural networks to approximate the true mutual information and learn privacy-preserving node representations. We finally evaluate our framework on various graph datasets.
CVApr 22, 2021
Towards Adversarial Patch Analysis and Certified Defense against Crowd CountingQiming Wu, Zhikang Zou, Pan Zhou et al.
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems. Especially, deep neural network (DNN) methods have significantly reduced estimation errors for crowd counting missions. Recent studies have demonstrated that DNNs are vulnerable to adversarial attacks, i.e., normal images with human-imperceptible perturbations could mislead DNNs to make false predictions. In this work, we propose a robust attack strategy called Adversarial Patch Attack with Momentum (APAM) to systematically evaluate the robustness of crowd counting models, where the attacker's goal is to create an adversarial perturbation that severely degrades their performances, thus leading to public safety accidents (e.g., stampede accidents). Especially, the proposed attack leverages the extreme-density background information of input images to generate robust adversarial patches via a series of transformations (e.g., interpolation, rotation, etc.). We observe that by perturbing less than 6\% of image pixels, our attacks severely degrade the performance of crowd counting systems, both digitally and physically. To better enhance the adversarial robustness of crowd counting models, we propose the first regression model-based Randomized Ablation (RA), which is more sufficient than Adversarial Training (ADT) (Mean Absolute Error of RA is 5 lower than ADT on clean samples and 30 lower than ADT on adversarial examples). Extensive experiments on five crowd counting models demonstrate the effectiveness and generality of the proposed method. The supplementary materials and certificate retrained models are available at \url{https://www.dropbox.com/s/hc4fdx133vht0qb/ACM_MM2021_Supp.pdf?dl=0}
LGDec 24, 2020
Semi-Supervised Node Classification on Graphs: Markov Random Fields vs. Graph Neural NetworksBinghui Wang, Jinyuan Jia, Neil Zhenqiang Gong
Semi-supervised node classification on graph-structured data has many applications such as fraud detection, fake account and review detection, user's private attribute inference in social networks, and community detection. Various methods such as pairwise Markov Random Fields (pMRF) and graph neural networks were developed for semi-supervised node classification. pMRF is more efficient than graph neural networks. However, existing pMRF-based methods are less accurate than graph neural networks, due to a key limitation that they assume a heuristics-based constant edge potential for all edges. In this work, we aim to address the key limitation of existing pMRF-based methods. In particular, we propose to learn edge potentials for pMRF. Our evaluation results on various types of graph datasets show that our optimized pMRF-based method consistently outperforms existing graph neural networks in terms of both accuracy and efficiency. Our results highlight that previous work may have underestimated the power of pMRF for semi-supervised node classification.
LGDec 8, 2020
Provable Defense against Privacy Leakage in Federated Learning from Representation PerspectiveJingwei Sun, Ang Li, Binghui Wang et al.
Federated learning (FL) is a popular distributed learning framework that can reduce privacy risks by not explicitly sharing private data. However, recent works demonstrated that sharing model updates makes FL vulnerable to inference attacks. In this work, we show our key observation that the data representation leakage from gradients is the essential cause of privacy leakage in FL. We also provide an analysis of this observation to explain how the data presentation is leaked. Based on this observation, we propose a defense against model inversion attack in FL. The key idea of our defense is learning to perturb data representation such that the quality of the reconstructed data is severely degraded, while FL performance is maintained. In addition, we derive certified robustness guarantee to FL and convergence guarantee to FedAvg, after applying our defense. To evaluate our defense, we conduct experiments on MNIST and CIFAR10 for defending against the DLG attack and GS attack. Without sacrificing accuracy, the results demonstrate that our proposed defense can increase the mean squared error between the reconstructed data and the raw data by as much as more than 160X for both DLG attack and GS attack, compared with baseline defense methods. The privacy of the FL system is significantly improved.
LGDec 8, 2020
GraphFL: A Federated Learning Framework for Semi-Supervised Node Classification on GraphsBinghui Wang, Ang Li, Hai Li et al.
Graph-based semi-supervised node classification (GraphSSC) has wide applications, ranging from networking and security to data mining and machine learning, etc. However, existing centralized GraphSSC methods are impractical to solve many real-world graph-based problems, as collecting the entire graph and labeling a reasonable number of labels is time-consuming and costly, and data privacy may be also violated. Federated learning (FL) is an emerging learning paradigm that enables collaborative learning among multiple clients, which can mitigate the issue of label scarcity and protect data privacy as well. Therefore, performing GraphSSC under the FL setting is a promising solution to solve real-world graph-based problems. However, existing FL methods 1) perform poorly when data across clients are non-IID, 2) cannot handle data with new label domains, and 3) cannot leverage unlabeled data, while all these issues naturally happen in real-world graph-based problems. To address the above issues, we propose the first FL framework, namely GraphFL, for semi-supervised node classification on graphs. Our framework is motivated by meta-learning methods. Specifically, we propose two GraphFL methods to respectively address the non-IID issue in graph data and handle the tasks with new label domains. Furthermore, we design a self-training method to leverage unlabeled graph data. We adopt representative graph neural networks as GraphSSC methods and evaluate GraphFL on multiple graph datasets. Experimental results demonstrate that GraphFL significantly outperforms the compared FL baseline and GraphFL with self-training can obtain better performance.
CRNov 15, 2020
Almost Tight L0-norm Certified Robustness of Top-k Predictions against Adversarial PerturbationsJinyuan Jia, Binghui Wang, Xiaoyu Cao et al.
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches. $\ell_0$-norm adversarial perturbation characterizes an attack that arbitrarily modifies some features of an input such that a classifier makes an incorrect prediction for the perturbed input. $\ell_0$-norm adversarial perturbation is easy to interpret and can be implemented in the physical world. Therefore, certifying robustness of top-$k$ predictions against $\ell_0$-norm adversarial perturbation is important. However, existing studies either focused on certifying $\ell_0$-norm robustness of top-$1$ predictions or $\ell_2$-norm robustness of top-$k$ predictions. In this work, we aim to bridge the gap. Our approach is based on randomized smoothing, which builds a provably robust classifier from an arbitrary classifier via randomizing an input. Our major theoretical contribution is an almost tight $\ell_0$-norm certified robustness guarantee for top-$k$ predictions. We empirically evaluate our method on CIFAR10 and ImageNet. For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2\% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
CROct 26, 2020
Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting CodesJinyuan Jia, Binghui Wang, Neil Zhenqiang Gong
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier and then deploys the classifier as an end-user software product or a cloud service. In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool. The attacker embeds a message into the DNN classifier during training and recovers the message via querying the API of the black-box classifier after the user deploys it. Information embedding attacks have attracted growing attention because of various applications such as watermarking DNN classifiers and compromising user privacy. State-of-the-art information embedding attacks have two key limitations: 1) they cannot verify the correctness of the recovered message, and 2) they are not robust against post-processing of the classifier. In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods. Specifically, we leverage Cyclic Redundancy Check to verify the correctness of the recovered message. Moreover, to be robust against post-processing, we leverage Turbo codes, a type of error-correcting codes, to encode the message before embedding it to the DNN classifier. We propose to recover the message via adaptively querying the classifier to save queries. Our adaptive recovery strategy leverages the property of Turbo codes that supports error correcting with a partial code. We evaluate our information embedding attacks using simulated messages and apply them to three applications, where messages have semantic interpretations. We consider 8 popular methods to post-process the classifier. Our results show that our attacks can accurately and verifiably recover the messages in all considered scenarios, while state-of-the-art attacks cannot accurately recover the messages in many scenarios.
CRSep 1, 2020
Reinforcement Learning-based Black-Box Evasion Attacks to Link Prediction in Dynamic GraphsHouxiang Fan, Binghui Wang, Pan Zhou et al.
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc. Various LPDG methods based on graph embedding and graph neural networks have been recently proposed and achieved state-of-the-art performance. In this paper, we study the vulnerability of LPDG methods and propose the first practical black-box evasion attack. Specifically, given a trained LPDG model, our attack aims to perturb the graph structure, without knowing to model parameters, model architecture, etc., such that the LPDG model makes as many wrong predicted links as possible. We design our attack based on a stochastic policy-based RL algorithm. Moreover, we evaluate our attack on three real-world graph datasets from different application domains. Experimental results show that our attack is both effective and efficient.
CRAug 24, 2020
Certified Robustness of Graph Neural Networks against Adversarial Structural PerturbationBinghui Wang, Jinyuan Jia, Xiaoyu Cao et al.
Graph neural networks (GNNs) have recently gained much attention for node and graph classification tasks on graph-structured data. However, multiple recent works showed that an attacker can easily make GNNs predict incorrectly via perturbing the graph structure, i.e., adding or deleting edges in the graph. We aim to defend against such attacks via developing certifiably robust GNNs. Specifically, we prove the certified robustness guarantee of any GNN for both node and graph classifications against structural perturbation. Moreover, we show that our certified robustness guarantee is tight. Our results are based on a recently proposed technique called randomized smoothing, which we extend to graph data. We also empirically evaluate our method for both node and graph classifications on multiple GNNs and multiple benchmark datasets. For instance, on the Cora dataset, Graph Convolutional Network with our randomized smoothing can achieve a certified accuracy of 0.49 when the attacker can arbitrarily add/delete at most 15 edges in the graph.
LGAug 7, 2020
LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID DatasetsAng Li, Jingwei Sun, Binghui Wang et al.
Federated learning is a popular distributed machine learning paradigm with enhanced privacy. Its primary goal is learning a global model that offers good performance for the participants as many as possible. The technology is rapidly advancing with many unsolved challenges, among which statistical heterogeneity (i.e., non-IID) and communication efficiency are two critical ones that hinder the development of federated learning. In this work, we propose LotteryFL -- a personalized and communication-efficient federated learning framework via exploiting the Lottery Ticket hypothesis. In LotteryFL, each client learns a lottery ticket network (i.e., a subnetwork of the base model) by applying the Lottery Ticket hypothesis, and only these lottery networks will be communicated between the server and clients. Rather than learning a shared global model in classic federated learning, each client learns a personalized model via LotteryFL; the communication cost can be significantly reduced due to the compact size of lottery networks. To support the training and evaluation of our framework, we construct non-IID datasets based on MNIST, CIFAR-10 and EMNIST by taking feature distribution skew, label distribution skew and quantity skew into consideration. Experiments on these non-IID datasets demonstrate that LotteryFL significantly outperforms existing solutions in terms of personalization and communication cost.
CRJun 19, 2020
Backdoor Attacks to Graph Neural NetworksZaixi Zhang, Jinyuan Jia, Binghui Wang et al.
In this work, we propose the first backdoor attack to graph neural networks (GNN). Specifically, we propose a \emph{subgraph based backdoor attack} to GNN for graph classification. In our backdoor attack, a GNN classifier predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph. Our empirical results on three real-world graph datasets show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs. Moreover, we generalize a randomized smoothing based certified defense to defend against our backdoor attacks. Our empirical results show that the defense is effective in some cases but ineffective in other cases, highlighting the needs of new defenses for our backdoor attacks.
CRApr 29, 2020
Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack TransferabilityNathan Inkawhich, Kevin J Liang, Binghui Wang et al.
We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10\times$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.
CRFeb 26, 2020
On Certifying Robustness against Backdoor Attacks via Randomized SmoothingBinghui Wang, Xiaoyu Cao, Jinyuan jia et al.
Backdoor attack is a severe security threat to deep neural networks (DNNs). We envision that, like adversarial examples, there will be a cat-and-mouse game for backdoor attacks, i.e., new empirical defenses are developed to defend against backdoor attacks but they are soon broken by strong adaptive backdoor attacks. To prevent such cat-and-mouse game, we take the first step towards certified defenses against backdoor attacks. Specifically, in this work, we study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing. Randomized smoothing was originally developed to certify robustness against adversarial examples. We generalize randomized smoothing to defend against backdoor attacks. Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks. However, we also find that existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks, which highlight the needs of new theory and methods to certify robustness against backdoor attacks.
CRFeb 9, 2020
Certified Robustness of Community Detection against Adversarial Structural Perturbation via Randomized SmoothingJinyuan Jia, Binghui Wang, Xiaoyu Cao et al.
Community detection plays a key role in understanding graph structure. However, several recent studies showed that community detection is vulnerable to adversarial structural perturbation. In particular, via adding or removing a small number of carefully selected edges in a graph, an attacker can manipulate the detected communities. However, to the best of our knowledge, there are no studies on certifying robustness of community detection against such adversarial structural perturbation. In this work, we aim to bridge this gap. Specifically, we develop the first certified robustness guarantee of community detection against adversarial structural perturbation. Given an arbitrary community detection method, we build a new smoothed community detection method via randomly perturbing the graph structure. We theoretically show that the smoothed community detection method provably groups a given arbitrary set of nodes into the same community (or different communities) when the number of edges added/removed by an attacker is bounded. Moreover, we show that our certified robustness is tight. We also empirically evaluate our method on multiple real-world graphs with ground truth communities.
CRMar 1, 2019
Attacking Graph-based Classification via Manipulating the Graph StructureBinghui Wang, Neil Zhenqiang Gong
Graph-based classification methods are widely used for security and privacy analytics. Roughly speaking, graph-based classification methods include collective classification and graph neural network. Evading a graph-based classification method enables an attacker to evade detection in security analytics and can be used as a privacy defense against inference attacks. Existing adversarial machine learning studies mainly focused on machine learning for non-graph data. Only a few recent studies touched adversarial graph-based classification methods. However, they focused on graph neural network methods, leaving adversarial collective classification largely unexplored. We aim to bridge this gap in this work. We first propose a threat model to characterize the attack surface of a collective classification method. Specifically, we characterize an attacker's background knowledge along three dimensions: parameters of the method, training dataset, and the complete graph; an attacker's goal is to evade detection via manipulating the graph structure. We formulate our attack as a graph-based optimization problem, solving which produces the edges that an attacker needs to manipulate to achieve its attack goal. Moreover, we propose several approximation techniques to solve the optimization problem. We evaluate our attacks and compare them with a recent attack designed for graph neural networks. Results show that our attacks 1) can effectively evade graph-based classification methods; 2) do not require access to the true parameters, true training dataset, and/or complete graph; and 3) outperform the existing attack for evading collective classification methods and some graph neural network methods. We also apply our attacks to evade Sybil detection using a large-scale Twitter dataset and apply our attacks as a defense against attribute inference attacks using a large-scale Google+ dataset.