Robust Watermarking of Neural Network with Exponential Weighting
This addresses the problem of protecting intellectual property in deep learning models for model owners, offering a robust solution against malicious redistribution.
The authors tackled the vulnerability of existing neural network watermarking methods to query modification attacks, and proposed a new method called exponential weighting that maintains high verification performance under both model and query modification attacks without degrading predictive accuracy.
Deep learning has been achieving top performance in many tasks. Since training of a deep learning model requires a great deal of cost, we need to treat neural network models as valuable intellectual properties. One concern in such a situation is that some malicious user might redistribute the model or provide a prediction service using the model without permission. One promising solution is digital watermarking, to embed a mechanism into the model so that the owner of the model can verify the ownership of the model externally. In this study, we present a novel attack method against watermark, query modification, and demonstrate that all of the existing watermark methods are vulnerable to either of query modification or existing attack method (model modification). To overcome this vulnerability, we present a novel watermarking method, exponential weighting. We experimentally show that our watermarking method achieves high verification performance of watermark even under a malicious attempt of unauthorized service providers, such as model modification and query modification, without sacrificing the predictive performance of the neural network model.