SD AIJan 28

Self Voice Conversion as an Attack against Neural Audio Watermarking

Yigitcan Özer, Wanying Ge, Zhe Zhang, Xin Wang, Junichi Yamagishi

arXiv:2601.20432v15.52 citationsh-index: 37

Originality Highly original

AI Analysis

This work addresses a security problem for audio watermarking systems by revealing a significant threat from deep learning-based attacks, which is incremental as it builds on existing robustness assessments.

The paper tackled the vulnerability of neural audio watermarking to a novel attack using self voice conversion, which remaps a speaker's voice to the same identity while altering acoustic characteristics, and demonstrated that this attack severely degrades the reliability of state-of-the-art watermarking approaches.

Audio watermarking embeds auxiliary information into speech while maintaining speaker identity, linguistic content, and perceptual quality. Although recent advances in neural and digital signal processing-based watermarking methods have improved imperceptibility and embedding capacity, robustness is still primarily assessed against conventional distortions such as compression, additive noise, and resampling. However, the rise of deep learning-based attacks introduces novel and significant threats to watermark security. In this work, we investigate self voice conversion as a universal, content-preserving attack against audio watermarking systems. Self voice conversion remaps a speaker's voice to the same identity while altering acoustic characteristics through a voice conversion model. We demonstrate that this attack severely degrades the reliability of state-of-the-art watermarking approaches and highlight its implications for the security of modern audio watermarking techniques.

View on arXiv PDF

Similar