Invisible Watermarking for Audio Generation Diffusion Models
This addresses the problem of model ownership and integrity protection for users and developers in the audio-based machine learning domain, representing a novel approach but with incremental application of watermarking to a new domain.
The paper tackles the problem of safeguarding model integrity and data copyright in audio diffusion models by introducing the first invisible watermarking technique for audio generation diffusion models trained on mel-spectrograms, demonstrating that it effectively protects against unauthorized modifications while maintaining high utility in benign audio generation tasks.
Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first watermarking technique applied to audio diffusion models trained on mel-spectrograms. This offers a novel approach to the aforementioned challenges. Our model excels not only in benign audio generation, but also incorporates an invisible watermarking trigger mechanism for model verification. This watermark trigger serves as a protective layer, enabling the identification of model ownership and ensuring its integrity. Through extensive experiments, we demonstrate that invisible watermark triggers can effectively protect against unauthorized modifications while maintaining high utility in benign audio generation tasks.