The Forensic Cost of Watermark Removal
For researchers and practitioners in watermarking and adversarial attacks, this paper highlights a critical overlooked axis (forensic detectability) that must be considered alongside attack success and perceptual quality.
The paper identifies that current watermark removal methods leave statistical artifacts that enable detection, and shows that a classifier trained on these artifacts achieves state-of-the-art detection at 10^{-3} FPR across all tested methods, establishing forensic stealthiness as a necessary requirement.
Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates at $10^{-3}$ FPR across every removal method tested. No existing attack accounts for this forensic leakage. We benchmark leading watermarking schemes against standard removal pipelines under the extended evaluation triple of attack success, perceptual quality, and forensic detectability, and find that no current method balances all three. Our results establish forensic stealthiness as a necessary requirement for watermark removal.