Protecting Your NLG Models with Semantic and Robust Watermarks
This addresses the need for secure IP protection in NLG applications, offering a solution that is less detectable and harmful than existing methods, though it appears incremental as it builds on prior watermarking technologies.
The paper tackles the problem of protecting intellectual property of natural language generation (NLG) models from theft or unauthorized use by proposing a semantic and robust watermarking scheme that uses unharmful phrase pairs, demonstrating effectiveness, robustness, and undetectability in experiments.
Natural language generation (NLG) applications have gained great popularity due to the powerful deep learning techniques and large training corpus. The deployed NLG models may be stolen or used without authorization, while watermarking has become a useful tool to protect Intellectual Property (IP) of deep models. However, existing watermarking technologies using backdoors are easily detected or harmful for NLG applications. In this paper, we propose a semantic and robust watermarking scheme for NLG models that utilize unharmful phrase pairs as watermarks for IP protection. The watermarks give NLG models personal preference for some special phrase combinations. Specifically, we generate watermarks by following a semantic combination pattern and systematically augment the watermark corpus to enhance the robustness. Then, we embed these watermarks into an NLG model without misleading its original attention mechanism. We conduct extensive experiments and the results demonstrate the effectiveness, robustness, and undetectability of the proposed scheme.