CRLGMLSep 18, 2020

Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models

arXiv:2009.08697v241 citations
AI Analysis

This addresses the vulnerability of watermarking schemes for DNN models, offering a practical attack that is less resource-intensive than prior methods, though it is incremental as it builds on existing watermark removal efforts.

The paper tackles the problem of removing watermarks from DNN models to protect intellectual property, proposing a simple transformation algorithm that combines imperceptible pattern embedding and spatial-level transformations, which achieves high success rates in bypassing state-of-the-art watermarking solutions.

Watermarking has become the tendency in protecting the intellectual property of DNN models. Recent works, from the adversary's perspective, attempted to subvert watermarking mechanisms by designing watermark removal attacks. However, these attacks mainly adopted sophisticated fine-tuning techniques, which have certain fatal drawbacks or unrealistic assumptions. In this paper, we propose a novel watermark removal attack from a different perspective. Instead of just fine-tuning the watermarked models, we design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations, which can effectively and blindly destroy the memorization of watermarked models to the watermark samples. We also introduce a lightweight fine-tuning strategy to preserve the model performance. Our solution requires much less resource or knowledge about the watermarking scheme than prior works. Extensive experimental results indicate that our attack can bypass state-of-the-art watermarking solutions with very high success rates. Based on our attack, we propose watermark augmentation techniques to enhance the robustness of existing watermarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes