Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?
This work addresses security risks for copyright protection in digital images, particularly in the context of generative AI, by exposing vulnerabilities in existing watermarking defenses and proposing mitigations.
The paper tackled the vulnerability of content-agnostic digital watermarking methods to steganalysis attacks, demonstrating that averaging watermarked images can extract and remove watermarks with minimal distortion, and even forge watermarks on clean images for some algorithms like Tree-Ring.
Digital watermarking techniques are crucial for copyright protection and source identification of images, especially in the era of generative AI models. However, many existing watermarking methods, particularly content-agnostic approaches that embed fixed patterns regardless of image content, are vulnerable to steganalysis attacks that can extract and remove the watermark with minimal perceptual distortion. In this work, we categorize watermarking algorithms into content-adaptive and content-agnostic ones, and demonstrate how averaging a collection of watermarked images could reveal the underlying watermark pattern. We then leverage this extracted pattern for effective watermark removal under both graybox and blackbox settings, even when the collection contains multiple watermark patterns. For some algorithms like Tree-Ring watermarks, the extracted pattern can also forge convincing watermarks on clean images. Our quantitative and qualitative evaluations across twelve watermarking methods highlight the threat posed by steganalysis to content-agnostic watermarks and the importance of designing watermarking techniques resilient to such analytical attacks. We propose security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis. We also suggest multi-key assignments as potential mitigations against steganalysis vulnerabilities.