CVAug 16, 2022

Neural network fragile watermarking with no model performance degradation

arXiv:2208.07585v126 citationsh-index: 62
Originality Incremental advance
AI Analysis

This addresses security concerns for users of neural network models by providing a detection method that avoids the performance trade-offs common in existing approaches.

The paper tackles the problem of detecting malicious fine-tuning attacks on deep neural networks without degrading model performance, achieving effective detection with no performance loss.

Deep neural networks are vulnerable to malicious fine-tuning attacks such as data poisoning and backdoor attacks. Therefore, in recent research, it is proposed how to detect malicious fine-tuning of neural network models. However, it usually negatively affects the performance of the protected model. Thus, we propose a novel neural network fragile watermarking with no model performance degradation. In the process of watermarking, we train a generative model with the specific loss function and secret key to generate triggers that are sensitive to the fine-tuning of the target classifier. In the process of verifying, we adopt the watermarked classifier to get labels of each fragile trigger. Then, malicious fine-tuning can be detected by comparing secret keys and labels. Experiments on classic datasets and classifiers show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes