CVCRApr 5, 2022

Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting

arXiv:2204.01934v18 citationsh-index: 34
Originality Highly original
AI Analysis

This addresses the issue of protecting intellectual property in AI models for model owners and attackers, presenting a novel attack method that is incremental in improving watermark removal techniques.

The paper tackles the problem of removing watermarks from deep learning models without access to source data, and introduces Attention Distraction (AD), a method that uses continual learning with unlabeled data and lures to achieve thorough watermark removal without compromising model performance, outperforming state-of-the-art works.

Fine-tuning attacks are effective in removing the embedded watermarks in deep learning models. However, when the source data is unavailable, it is challenging to just erase the watermark without jeopardizing the model performance. In this context, we introduce Attention Distraction (AD), a novel source data-free watermark removal attack, to make the model selectively forget the embedded watermarks by customizing continual learning. In particular, AD first anchors the model's attention on the main task using some unlabeled data. Then, through continual learning, a small number of \textit{lures} (randomly selected natural images) that are assigned a new label distract the model's attention away from the watermarks. Experimental results from different datasets and networks corroborate that AD can thoroughly remove the watermark with a small resource budget without compromising the model's performance on the main task, which outperforms the state-of-the-art works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes