Model Extraction and Defenses on Generative Adversarial Networks
This work addresses the security vulnerability of model extraction for generative models, which is a significant concern for organizations deploying GANs.
This paper investigates model extraction attacks on Generative Adversarial Networks (GANs), defining accuracy and fidelity for such attacks. The authors demonstrate that an adversary can extract a state-of-the-art GAN trained on over 3 million images and transfer its knowledge to new domains. They also propose defense techniques to balance utility and security.
Model extraction attacks aim to duplicate a machine learning model through query access to a target model. Early studies mainly focus on discriminative models. Despite the success, model extraction attacks against generative models are less well explored. In this paper, we systematically study the feasibility of model extraction attacks against generative adversarial networks (GANs). Specifically, we first define accuracy and fidelity on model extraction attacks against GANs. Then we study model extraction attacks against GANs from the perspective of accuracy extraction and fidelity extraction, according to the adversary's goals and background knowledge. We further conduct a case study where an adversary can transfer knowledge of the extracted model which steals a state-of-the-art GAN trained with more than 3 million images to new domains to broaden the scope of applications of model extraction attacks. Finally, we propose effective defense techniques to safeguard GANs, considering a trade-off between the utility and security of GAN models.