SD AI CRJan 30, 2024

Proactive Detection of Voice Cloning with Localized Watermarking

Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

Meta AI

arXiv:2401.17264v231.6132 citationsh-index: 24Has CodeICML

Originality Highly original

AI Analysis

This addresses the need for scalable and real-time detection of voice cloning in speech generative models, representing a novel method for a known bottleneck.

The paper tackles the problem of detecting AI-generated speech to ensure audio authenticity by introducing AudioSeal, a localized audio watermarking technique that achieves state-of-the-art robustness and imperceptibility, with detection speeds up to two orders of magnitude faster than existing models.

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

View on arXiv PDF Code

Similar