CRFeb 6
Trojans in Artificial Intelligence (TrojAI) Final ReportKristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski et al.
The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.
LGAug 15, 2022
A Tool for Neural Network Global Robustness Certification and TrainingZhilu Wang, Yixuan Wang, Feisi Fu et al.
With the increment of interest in leveraging machine learning technology in safety-critical systems, the robustness of neural networks under external disturbance receives more and more concerns. Global robustness is a robustness property defined on the entire input domain. And a certified globally robust network can ensure its robustness on any possible network input. However, the state-of-the-art global robustness certification algorithm can only certify networks with at most several thousand neurons. In this paper, we propose the GPU-supported global robustness certification framework GROCET, which is more efficient than the previous optimization-based certification approach. Moreover, GROCET provides differentiable global robustness, which is leveraged in the training of globally robust neural networks.
CRJun 15, 2023
OVLA: Neural Network Ownership Verification using Latent WatermarksFeisi Fu, Wenchao Li
Ownership verification for neural networks is important for protecting these models from illegal copying, free-riding, re-distribution and other intellectual property misuse. We present a novel methodology for neural network ownership verification based on the notion of latent watermarks. Existing ownership verification methods either modify or introduce constraints to the neural network parameters, which are accessible to an attacker in a white-box attack and can be harmful to the network's normal operation, or train the network to respond to specific watermarks in the inputs similar to data poisoning-based backdoor attacks, which are susceptible to backdoor removal techniques. In this paper, we address these problems by decoupling a network's normal operation from its responses to watermarked inputs during ownership verification. The key idea is to train the network such that the watermarks remain dormant unless the owner's secret key is applied to activate it. The secret key is realized as a specific perturbation only known to the owner to the network's parameters. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.In addition, our method provides protection against ambiguity attacks where the attacker either tries to guess the secret weight key or uses fine-tuning to embed their own watermarks with a different key into a pre-trained neural network. Experimental results demonstrate the advantages and effectiveness of our proposed approach.
CRNov 2, 2022
Dormant Neural TrojansFeisi Fu, Panagiota Kiourti, Wenchao Li
We present a novel methodology for neural network backdoor attacks. Unlike existing training-time attacks where the Trojaned network would respond to the Trojan trigger after training, our approach inserts a Trojan that will remain dormant until it is activated. The activation is realized through a specific perturbation to the network's weight parameters only known to the attacker. Our analysis and the experimental results demonstrate that dormant Trojaned networks can effectively evade detection by state-of-the-art backdoor detection methods.
LGOct 14, 2021Code
Sound and Complete Neural Network Repair with Minimality and Locality GuaranteesFeisi Fu, Wenchao Li
We present a novel methodology for repairing neural networks that use ReLU activation functions. Unlike existing methods that rely on modifying the weights of a neural network which can induce a global change in the function space, our approach applies only a localized change in the function space while still guaranteeing the removal of the buggy behavior. By leveraging the piecewise linear nature of ReLU networks, our approach can efficiently construct a patch network tailored to the linear region where the buggy input resides, which when combined with the original network, provably corrects the behavior on the buggy input. Our method is both sound and complete -- the repaired network is guaranteed to fix the buggy input, and a patch is guaranteed to be found for any buggy input. Moreover, our approach preserves the continuous piecewise linear nature of ReLU networks, automatically generalizes the repair to all the points including other undetected buggy inputs inside the repair region, is minimal in terms of changes in the function space, and guarantees that outputs on inputs away from the repair region are unaltered. On several benchmarks, we show that our approach significantly outperforms existing methods in terms of locality and limiting negative side effects. Our code is available on GitHub: https://github.com/BU-DEPEND-Lab/REASSURE.