Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks
This addresses the risk of tampering in AI models during deployment, offering a solution for model integrity verification, though it appears incremental as it builds on existing fragile watermarking techniques.
The paper tackles the problem of detecting and locating tampering in deep neural networks by proposing an adaptive white-box watermarking method with self-mutual check parameters, achieving recovery performance for modification rates below 20% and recovering over 15% of accuracy loss in affected models.
Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.