CRJun 6, 2022
PCPT and ACPT: Copyright Protection and Traceability Scheme for DNN ModelsXuefeng Fan, Dahao Fu, Hangyu Gui et al.
Deep neural networks (DNNs) have achieved tremendous success in artificial intelligence (AI) fields. However, DNN models can be easily illegally copied, redistributed, or abused by criminals, seriously damaging the interests of model inventors. The copyright protection of DNN models by neural network watermarking has been studied, but the establishment of a traceability mechanism for determining the authorized users of a leaked model is a new problem driven by the demand for AI services. Because the existing traceability mechanisms are used for models without watermarks, a small number of false-positives are generated. Existing black-box active protection schemes have loose authorization control and are vulnerable to forgery attacks. Therefore, based on the idea of black-box neural network watermarking with the video framing and image perceptual hash algorithm, a passive copyright protection and traceability framework PCPT is proposed that uses an additional class of DNN models, improving the existing traceability mechanism that yields a small number of false-positives. Based on an authorization control strategy and image perceptual hash algorithm, a DNN model active copyright protection and traceability framework ACPT is proposed. This framework uses the authorization control center constructed by the detector and verifier. This approach realizes stricter authorization control, which establishes a strong connection between users and model owners, improves the framework security, and supports traceability verification.
CRAug 9, 2022
DeepHider: A Covert NLP Watermarking Framework Based on Multi-task LearningLong Dai, Jiarong Mao, Xuefeng Fan et al.
Natural language processing (NLP) technology has shown great commercial value in applications such as sentiment analysis. But NLP models are vulnerable to the threat of pirated redistribution, damaging the economic interests of model owners. Digital watermarking technology is an effective means to protect the intellectual property rights of NLP model. The existing NLP model protection mainly designs watermarking schemes by improving both security and robustness purposes, however, the security and robustness of these schemes have the following problems, respectively: (1) Watermarks are difficult to defend against fraudulent declaration by adversary and are easily detected and blocked from verification by human or anomaly detector during the verification process. (2) The watermarking model cannot meet multiple robustness requirements at the same time. To solve the above problems, this paper proposes a novel watermarking framework for NLP model based on the over-parameterization of depth model and the multi-task learning theory. Specifically, a covert trigger set is established to realize the perception-free verification of the watermarking model, and a novel auxiliary network is designed to improve the robustness and security of the watermarking model. The proposed framework was evaluated on two benchmark datasets and three mainstream NLP models, and the results show that the framework can successfully validate model ownership with 100% validation accuracy and advanced robustness and security without compromising the host model performance.
CROct 25, 2023
RAEDiff: Denoising Diffusion Probabilistic Models Based Reversible Adversarial Examples Self-Generation and Self-RecoveryFan Xing, Xiaoyi Zhou, Xuefeng Fan et al.
Collected and annotated datasets, which are obtained through extensive efforts, are effective for training Deep Neural Network (DNN) models. However, these datasets are susceptible to be misused by unauthorized users, resulting in infringement of Intellectual Property (IP) rights owned by the dataset creators. Reversible Adversarial Exsamples (RAE) can help to solve the issues of IP protection for datasets. RAEs are adversarial perturbed images that can be restored to the original. As a cutting-edge approach, RAE scheme can serve the purposes of preventing unauthorized users from engaging in malicious model training, as well as ensuring the legitimate usage of authorized users. Nevertheless, in the existing work, RAEs still rely on the embedded auxiliary information for restoration, which may compromise their adversarial abilities. In this paper, a novel self-generation and self-recovery method, named as RAEDiff, is introduced for generating RAEs based on a Denoising Diffusion Probabilistic Models (DDPM). It diffuses datasets into a Biased Gaussian Distribution (BGD) and utilizes the prior knowledge of the DDPM for generating and recovering RAEs. The experimental results demonstrate that RAEDiff effectively self-generates adversarial perturbations for DNN models, including Artificial Intelligence Generated Content (AIGC) models, while also exhibiting significant self-recovery capabilities.