SD LG ASJun 2, 2023

Improved DeepFake Detection Using Whisper Features

Piotr Kawa, Marcin Plata, Michał Czuba, Piotr Szymański, Piotr Syga

arXiv:2306.01428v121.871 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the threat of audio DeepFakes for security applications, but it is incremental as it builds on existing detection methods.

The paper tackled the problem of audio DeepFake detection by investigating the use of Whisper automatic speech recognition model as a front-end, showing that it improves detection across models and reduces Equal Error Rate by 21% on the In-The-Wild dataset.

With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the influence of the state-of-the-art Whisper automatic speech recognition model as a DF detection front-end. We compare various combinations of Whisper and well-established front-ends by training 3 detection models (LCNN, SpecRNet, and MesoNet) on a widely used ASVspoof 2021 DF dataset and later evaluating them on the DF In-The-Wild dataset. We show that using Whisper-based features improves the detection for each model and outperforms recent results on the In-The-Wild dataset by reducing Equal Error Rate by 21%.

View on arXiv PDF Code

Similar