CRCLLGMar 22, 2024

A Transfer Attack to Image Watermarks

arXiv:2403.15365v427 citationsh-index: 8Has CodeICLR
Originality Incremental advance
AI Analysis

This work highlights a critical security vulnerability in widely deployed industry watermarks for detecting AI-generated images, though it is incremental as it extends known attack settings to no-box scenarios.

The paper tackles the robustness of watermark-based AI-generated image detectors in the no-box setting, where attackers lack access to the model or API, and shows that existing methods are vulnerable to transfer evasion attacks, with empirical results demonstrating evasion success.

Watermark has been widely deployed by industry to detect AI-generated images. The robustness of such watermark-based detector against evasion attacks in the white-box and black-box settings is well understood in the literature. However, the robustness in the no-box setting is much less understood. In this work, we propose a new transfer evasion attack to image watermark in the no-box setting. Our transfer attack adds a perturbation to a watermarked image to evade multiple surrogate watermarking models trained by the attacker itself, and the perturbed watermarked image also evades the target watermarking model. Our major contribution is to show that, both theoretically and empirically, watermark-based AI-generated image detector based on existing watermarking methods is not robust to evasion attacks even if the attacker does not have access to the watermarking model nor the detection API. Our code is available at: https://github.com/hifi-hyp/Watermark-Transfer-Attack.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes