Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
This addresses model intellectual property protection for service providers, offering a practical and robust solution against piracy.
The paper tackles the problem of detecting stolen deep neural network models via model extraction attacks by using Universal Adversarial Perturbations (UAPs) as fingerprints, achieving detection with over 99.99% confidence using only 20 fingerprints.
In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.