Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting
This addresses the issue of protecting intellectual property in AI models for model owners and attackers, presenting a novel attack method that is incremental in improving watermark removal techniques.
The paper tackles the problem of removing watermarks from deep learning models without access to source data, and introduces Attention Distraction (AD), a method that uses continual learning with unlabeled data and lures to achieve thorough watermark removal without compromising model performance, outperforming state-of-the-art works.
Fine-tuning attacks are effective in removing the embedded watermarks in deep learning models. However, when the source data is unavailable, it is challenging to just erase the watermark without jeopardizing the model performance. In this context, we introduce Attention Distraction (AD), a novel source data-free watermark removal attack, to make the model selectively forget the embedded watermarks by customizing continual learning. In particular, AD first anchors the model's attention on the main task using some unlabeled data. Then, through continual learning, a small number of \textit{lures} (randomly selected natural images) that are assigned a new label distract the model's attention away from the watermarks. Experimental results from different datasets and networks corroborate that AD can thoroughly remove the watermark with a small resource budget without compromising the model's performance on the main task, which outperforms the state-of-the-art works.