Certifiably Robust Image Watermark
This work addresses the problem of ensuring reliable watermarking for AI-generated content to combat disinformation, representing a novel application with potential broad impact in security and media integrity.
The authors tackled the vulnerability of image watermarks to removal and forgery attacks by proposing the first watermarks with certified robustness guarantees, achieving this through an extension of randomized smoothing and demonstrating both certified and empirical robustness in evaluations.
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against removal and forgery attacks. Our method leverages randomized smoothing, a popular technique to build certifiably robust classifiers and regression models. Our major technical contributions include extending randomized smoothing to watermarking by considering its unique characteristics, deriving the certified robustness guarantees, and designing algorithms to estimate them. Moreover, we extensively evaluate our image watermarks in terms of both certified and empirical robustness. Our code is available at \url{https://github.com/zhengyuan-jiang/Watermark-Library}.