CRAug 22, 2023
Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural NetworksZhenzhe Gao, Zhaoxia Yin, Hongjian Zhan et al.
Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.
2.3CVApr 21
Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale FeaturesYunrui Gu, Zhenzhe Gao, Cong Kong et al.
Medical hyperspectral imaging (MHSI) has shown strong potential for disease diagnosis by capturing spectral-spatial information of tissues. While deep learning has substantially improved MHSI classification accuracy, its robustness remains limited due to the well-known trade-off between accuracy and robustness in Deep Neural Networks (DNNs). This issue is particularly critical in MHSI, where reliable prediction depends on local tissue relationships and multiscale spectral-spatial structures. A practical way to improve robustness is to identify the most unstable adversarial examples and incorporate them into adversarial training. However, existing attack methods do not sufficiently exploit these MHSI-specific properties, leading to suboptimal attack effectiveness and limited value for robustness enhancement. To address this gap, we propose a structured adversarial attack framework for MHSI that progressively models its local spectral-spatial dependencies and multiscale hierarchical representations. The proposed method generates anatomically consistent perturbations by modeling neighborhood dependencies and hierarchical spectral-spatial features. Experiments on the brain and choledoch datasets show that our method more effectively degrades lesion-related classification performance in critical tumor regions than existing baselines while maintaining low perturbation magnitude. These results reveal a clinically relevant robustness weakness in current MHSI models and provide stronger adversarial samples for developing targeted defense strategies.
CRApr 11, 2024
Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairingZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin et al.
Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
CRJun 7, 2024
A Survey of Fragile Model WatermarkingZhenzhe Gao, Yu Cheng, Zhaoxia Yin
Model fragile watermarking, inspired by both the field of adversarial attacks on neural networks and traditional multimedia fragile watermarking, has gradually emerged as a potent tool for detecting tampering, and has witnessed rapid development in recent years. Unlike robust watermarks, which are widely used for identifying model copyrights, fragile watermarks for models are designed to identify whether models have been subjected to unexpected alterations such as backdoors, poisoning, compression, among others. These alterations can pose unknown risks to model users, such as misidentifying stop signs as speed limit signs in classic autonomous driving scenarios. This paper provides an overview of the relevant work in the field of model fragile watermarking since its inception, categorizing them and revealing the developmental trajectory of the field, thus offering a comprehensive survey for future endeavors in model fragile watermarking.
CRMay 13, 2023
Decision-based iterative fragile watermarking for model integrity verificationZhaoxia Yin, Heng Yin, Hang Su et al.
Typically, foundation models are hosted on cloud servers to meet the high demand for their services. However, this exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system. To address this issue, we propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes. We then compare the output of sensitive samples from the original model to that of the compromised model during validation to assess the model's completeness.The proposed fragile watermarking algorithm is an optimization problem that aims to minimize the variance of the predicted probability distribution outputed by the target model when fed with the converted sample.We convert normal samples to fragile samples through multiple iterations. Our method has some advantages: (1) the iterative update of samples is done in a decision-based black-box manner, relying solely on the predicted probability distribution of the target model, which reduces the risk of exposure to adversarial attacks, (2) the small-amplitude multiple iterations approach allows the fragile samples to perform well visually, with a PSNR of 55 dB in TinyImageNet compared to the original samples, (3) even with changes in the overall parameters of the model of magnitude 1e-4, the fragile samples can detect such changes, and (4) the method is independent of the specific model structure and dataset. We demonstrate the effectiveness of our method on multiple models and datasets, and show that it outperforms the current state-of-the-art.